{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "8tQJd2YSCfWR"
   },
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "D7tqLMoKF6uq"
   },
   "source": [
    "Deep Learning\n",
    "=============\n",
    "\n",
    "Assignment 6\n",
    "------------\n",
    "\n",
    "After training a skip-gram model in `5_word2vec.ipynb`, the goal of this notebook is to train a LSTM character model over [Text8](http://mattmahoney.net/dc/textdata) data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "cellView": "both",
    "colab": {
     "autoexec": {
      "startup": false,
      "wait_interval": 0
     }
    },
    "colab_type": "code",
    "collapsed": false,
    "id": "MvEblsgEXxrd"
   },
   "outputs": [],
   "source": [
    "# These are all the modules we'll be using later. Make sure you can import them\n",
    "# before proceeding further.\n",
    "from __future__ import print_function\n",
    "import os\n",
    "import numpy as np\n",
    "import random\n",
    "import string\n",
    "import tensorflow as tf\n",
    "import zipfile\n",
    "from six.moves import range\n",
    "from six.moves.urllib.request import urlretrieve"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "cellView": "both",
    "colab": {
     "autoexec": {
      "startup": false,
      "wait_interval": 0
     },
     "output_extras": [
      {
       "item_id": 1
      }
     ]
    },
    "colab_type": "code",
    "collapsed": false,
    "executionInfo": {
     "elapsed": 5993,
     "status": "ok",
     "timestamp": 1445965582896,
     "user": {
      "color": "#1FA15D",
      "displayName": "Vincent Vanhoucke",
      "isAnonymous": false,
      "isMe": true,
      "permissionId": "05076109866853157986",
      "photoUrl": "//lh6.googleusercontent.com/-cCJa7dTDcgQ/AAAAAAAAAAI/AAAAAAAACgw/r2EZ_8oYer4/s50-c-k-no/photo.jpg",
      "sessionId": "6f6f07b359200c46",
      "userId": "102167687554210253930"
     },
     "user_tz": 420
    },
    "id": "RJ-o3UBUFtCw",
    "outputId": "d530534e-0791-4a94-ca6d-1c8f1b908a9e"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Found and verified text8.zip\n"
     ]
    }
   ],
   "source": [
    "url = 'http://mattmahoney.net/dc/'\n",
    "\n",
    "def maybe_download(filename, expected_bytes):\n",
    "  \"\"\"Download a file if not present, and make sure it's the right size.\"\"\"\n",
    "  if not os.path.exists(filename):\n",
    "    filename, _ = urlretrieve(url + filename, filename)\n",
    "  statinfo = os.stat(filename)\n",
    "  if statinfo.st_size == expected_bytes:\n",
    "    print('Found and verified %s' % filename)\n",
    "  else:\n",
    "    print(statinfo.st_size)\n",
    "    raise Exception(\n",
    "      'Failed to verify ' + filename + '. Can you get to it with a browser?')\n",
    "  return filename\n",
    "\n",
    "filename = maybe_download('text8.zip', 31344016)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "cellView": "both",
    "colab": {
     "autoexec": {
      "startup": false,
      "wait_interval": 0
     },
     "output_extras": [
      {
       "item_id": 1
      }
     ]
    },
    "colab_type": "code",
    "collapsed": false,
    "executionInfo": {
     "elapsed": 5982,
     "status": "ok",
     "timestamp": 1445965582916,
     "user": {
      "color": "#1FA15D",
      "displayName": "Vincent Vanhoucke",
      "isAnonymous": false,
      "isMe": true,
      "permissionId": "05076109866853157986",
      "photoUrl": "//lh6.googleusercontent.com/-cCJa7dTDcgQ/AAAAAAAAAAI/AAAAAAAACgw/r2EZ_8oYer4/s50-c-k-no/photo.jpg",
      "sessionId": "6f6f07b359200c46",
      "userId": "102167687554210253930"
     },
     "user_tz": 420
    },
    "id": "Mvf09fjugFU_",
    "outputId": "8f75db58-3862-404b-a0c3-799380597390"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Data size 100000000\n"
     ]
    }
   ],
   "source": [
    "def read_data(filename):\n",
    "  f = zipfile.ZipFile(filename)\n",
    "  for name in f.namelist():\n",
    "    return tf.compat.as_str(f.read(name))\n",
    "  f.close()\n",
    "  \n",
    "text = read_data(filename)\n",
    "print('Data size %d' % len(text))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "ga2CYACE-ghb"
   },
   "source": [
    "Create a small validation set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "cellView": "both",
    "colab": {
     "autoexec": {
      "startup": false,
      "wait_interval": 0
     },
     "output_extras": [
      {
       "item_id": 1
      }
     ]
    },
    "colab_type": "code",
    "collapsed": false,
    "executionInfo": {
     "elapsed": 6184,
     "status": "ok",
     "timestamp": 1445965583138,
     "user": {
      "color": "#1FA15D",
      "displayName": "Vincent Vanhoucke",
      "isAnonymous": false,
      "isMe": true,
      "permissionId": "05076109866853157986",
      "photoUrl": "//lh6.googleusercontent.com/-cCJa7dTDcgQ/AAAAAAAAAAI/AAAAAAAACgw/r2EZ_8oYer4/s50-c-k-no/photo.jpg",
      "sessionId": "6f6f07b359200c46",
      "userId": "102167687554210253930"
     },
     "user_tz": 420
    },
    "id": "w-oBpfFG-j43",
    "outputId": "bdb96002-d021-4379-f6de-a977924f0d02"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "99999000 ons anarchists advocate social relations based upon voluntary as\n",
      "1000  anarchism originated as a term of abuse first used against earl\n"
     ]
    }
   ],
   "source": [
    "valid_size = 1000\n",
    "valid_text = text[:valid_size]\n",
    "train_text = text[valid_size:]\n",
    "train_size = len(train_text)\n",
    "print(train_size, train_text[:64])\n",
    "print(valid_size, valid_text[:64])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "Zdw6i4F8glpp"
   },
   "source": [
    "Utility functions to map characters to vocabulary IDs and back."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "cellView": "both",
    "colab": {
     "autoexec": {
      "startup": false,
      "wait_interval": 0
     },
     "output_extras": [
      {
       "item_id": 1
      }
     ]
    },
    "colab_type": "code",
    "collapsed": false,
    "executionInfo": {
     "elapsed": 6276,
     "status": "ok",
     "timestamp": 1445965583249,
     "user": {
      "color": "#1FA15D",
      "displayName": "Vincent Vanhoucke",
      "isAnonymous": false,
      "isMe": true,
      "permissionId": "05076109866853157986",
      "photoUrl": "//lh6.googleusercontent.com/-cCJa7dTDcgQ/AAAAAAAAAAI/AAAAAAAACgw/r2EZ_8oYer4/s50-c-k-no/photo.jpg",
      "sessionId": "6f6f07b359200c46",
      "userId": "102167687554210253930"
     },
     "user_tz": 420
    },
    "id": "gAL1EECXeZsD",
    "outputId": "88fc9032-feb9-45ff-a9a0-a26759cc1f2e"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Unexpected character: ï\n",
      "1 26 0 0\n",
      "a z  \n"
     ]
    }
   ],
   "source": [
    "vocabulary_size = len(string.ascii_lowercase) + 1 # [a-z] + ' '\n",
    "first_letter = ord(string.ascii_lowercase[0])\n",
    "\n",
    "def char2id(char):\n",
    "  if char in string.ascii_lowercase:\n",
    "    return ord(char) - first_letter + 1\n",
    "  elif char == ' ':\n",
    "    return 0\n",
    "  else:\n",
    "    print('Unexpected character: %s' % char)\n",
    "    return 0\n",
    "  \n",
    "def id2char(dictid):\n",
    "  if dictid > 0:\n",
    "    return chr(dictid + first_letter - 1)\n",
    "  else:\n",
    "    return ' '\n",
    "\n",
    "print(char2id('a'), char2id('z'), char2id(' '), char2id('ï'))\n",
    "print(id2char(1), id2char(26), id2char(0))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "lFwoyygOmWsL"
   },
   "source": [
    "Function to generate a training batch for the LSTM model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "cellView": "both",
    "colab": {
     "autoexec": {
      "startup": false,
      "wait_interval": 0
     },
     "output_extras": [
      {
       "item_id": 1
      }
     ]
    },
    "colab_type": "code",
    "collapsed": false,
    "executionInfo": {
     "elapsed": 6473,
     "status": "ok",
     "timestamp": 1445965583467,
     "user": {
      "color": "#1FA15D",
      "displayName": "Vincent Vanhoucke",
      "isAnonymous": false,
      "isMe": true,
      "permissionId": "05076109866853157986",
      "photoUrl": "//lh6.googleusercontent.com/-cCJa7dTDcgQ/AAAAAAAAAAI/AAAAAAAACgw/r2EZ_8oYer4/s50-c-k-no/photo.jpg",
      "sessionId": "6f6f07b359200c46",
      "userId": "102167687554210253930"
     },
     "user_tz": 420
    },
    "id": "d9wMtjy5hCj9",
    "outputId": "3dd79c80-454a-4be0-8b71-4a4a357b3367"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['ons anarchi', 'when milita', 'lleria arch', ' abbeys and', 'married urr', 'hel and ric', 'y and litur', 'ay opened f', 'tion from t', 'migration t', 'new york ot', 'he boeing s', 'e listed wi', 'eber has pr', 'o be made t', 'yer who rec', 'ore signifi', 'a fierce cr', ' two six ei', 'aristotle s', 'ity can be ', ' and intrac', 'tion of the', 'dy to pass ', 'f certain d', 'at it will ', 'e convince ', 'ent told hi', 'ampaign and', 'rver side s', 'ious texts ', 'o capitaliz', 'a duplicate', 'gh ann es d', 'ine january', 'ross zero t', 'cal theorie', 'ast instanc', ' dimensiona', 'most holy m', 't s support', 'u is still ', 'e oscillati', 'o eight sub', 'of italy la', 's the tower', 'klahoma pre', 'erprise lin', 'ws becomes ', 'et in a naz', 'the fabian ', 'etchy to re', ' sharman ne', 'ised empero', 'ting in pol', 'd neo latin', 'th risky ri', 'encyclopedi', 'fense the a', 'duating fro', 'treet grid ', 'ations more', 'appeal of d', 'si have mad']\n",
      "['ists advoca', 'ary governm', 'hes nationa', 'd monasteri', 'raca prince', 'chard baer ', 'rgical lang', 'for passeng', 'the nationa', 'took place ', 'ther well k', 'seven six s', 'ith a gloss', 'robably bee', 'to recogniz', 'ceived the ', 'icant than ', 'ritic of th', 'ight in sig', 's uncaused ', ' lost as in', 'cellular ic', 'e size of t', ' him a stic', 'drugs confu', ' take to co', ' the priest', 'im to name ', 'd barred at', 'standard fo', ' such as es', 'ze on the g', 'e of the or', 'd hiver one', 'y eight mar', 'the lead ch', 'es classica', 'ce the non ', 'al analysis', 'mormons bel', 't or at lea', ' disagreed ', 'ing system ', 'btypes base', 'anguages th', 'r commissio', 'ess one nin', 'nux suse li', ' the first ', 'zi concentr', ' society ne', 'elatively s', 'etworks sha', 'or hirohito', 'litical ini', 'n most of t', 'iskerdoo ri', 'ic overview', 'air compone', 'om acnm acc', ' centerline', 'e than any ', 'devotional ', 'de such dev']\n",
      "[' a']\n",
      "['an']\n"
     ]
    }
   ],
   "source": [
    "batch_size=64\n",
    "num_unrollings=10\n",
    "\n",
    "class BatchGenerator(object):\n",
    "  def __init__(self, text, batch_size, num_unrollings):\n",
    "    self._text = text\n",
    "    self._text_size = len(text)\n",
    "    self._batch_size = batch_size\n",
    "    self._num_unrollings = num_unrollings\n",
    "    segment = self._text_size // batch_size\n",
    "    self._cursor = [ offset * segment for offset in range(batch_size)]\n",
    "    self._last_batch = self._next_batch()\n",
    "  \n",
    "  def _next_batch(self):\n",
    "    \"\"\"Generate a single batch from the current cursor position in the data.\"\"\"\n",
    "    batch = np.zeros(shape=(self._batch_size, vocabulary_size), dtype=np.float)\n",
    "    for b in range(self._batch_size):\n",
    "      batch[b, char2id(self._text[self._cursor[b]])] = 1.0\n",
    "      self._cursor[b] = (self._cursor[b] + 1) % self._text_size\n",
    "    return batch\n",
    "  \n",
    "  def next(self):\n",
    "    \"\"\"Generate the next array of batches from the data. The array consists of\n",
    "    the last batch of the previous array, followed by num_unrollings new ones.\n",
    "    \"\"\"\n",
    "    batches = [self._last_batch]\n",
    "    for step in range(self._num_unrollings):\n",
    "      batches.append(self._next_batch())\n",
    "    self._last_batch = batches[-1]\n",
    "    return batches\n",
    "\n",
    "def characters(probabilities):\n",
    "  \"\"\"Turn a 1-hot encoding or a probability distribution over the possible\n",
    "  characters back into its (most likely) character representation.\"\"\"\n",
    "  return [id2char(c) for c in np.argmax(probabilities, 1)]\n",
    "\n",
    "def batches2string(batches):\n",
    "  \"\"\"Convert a sequence of batches back into their (most likely) string\n",
    "  representation.\"\"\"\n",
    "  s = [''] * batches[0].shape[0]\n",
    "  for b in batches:\n",
    "    s = [''.join(x) for x in zip(s, characters(b))]\n",
    "  return s\n",
    "\n",
    "train_batches = BatchGenerator(train_text, batch_size, num_unrollings)\n",
    "valid_batches = BatchGenerator(valid_text, 1, 1)\n",
    "\n",
    "print(batches2string(train_batches.next()))\n",
    "print(batches2string(train_batches.next()))\n",
    "print(batches2string(valid_batches.next()))\n",
    "print(batches2string(valid_batches.next()))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "I always find useful to display the shape or the content of the variables to better understand their structure:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(64, 27)\n",
      "1562484\n",
      "26\n",
      "[[ 0.  0.  0.  0.]\n",
      " [ 0.  0.  0.  0.]]\n"
     ]
    }
   ],
   "source": [
    "print(train_batches.next()[1].shape)\n",
    "print(len(train_text) // batch_size)\n",
    "print(len(string.ascii_lowercase))\n",
    "print(np.zeros(shape=(2, 4), dtype=np.float))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "cellView": "both",
    "colab": {
     "autoexec": {
      "startup": false,
      "wait_interval": 0
     }
    },
    "colab_type": "code",
    "collapsed": true,
    "id": "KyVd8FxT5QBc"
   },
   "outputs": [],
   "source": [
    "def logprob(predictions, labels):\n",
    "  \"\"\"Log-probability of the true labels in a predicted batch.\"\"\"\n",
    "  predictions[predictions < 1e-10] = 1e-10\n",
    "  return np.sum(np.multiply(labels, -np.log(predictions))) / labels.shape[0]\n",
    "\n",
    "def sample_distribution(distribution):\n",
    "  \"\"\"Sample one element from a distribution assumed to be an array of normalized\n",
    "  probabilities.\n",
    "  \"\"\"\n",
    "  r = random.uniform(0, 1)\n",
    "  s = 0\n",
    "  for i in range(len(distribution)):\n",
    "    s += distribution[i]\n",
    "    if s >= r:\n",
    "      return i\n",
    "  return len(distribution) - 1\n",
    "\n",
    "def sample(prediction):\n",
    "  \"\"\"Turn a (column) prediction into 1-hot encoded samples.\"\"\"\n",
    "  p = np.zeros(shape=[1, vocabulary_size], dtype=np.float)\n",
    "  p[0, sample_distribution(prediction[0])] = 1.0\n",
    "  return p\n",
    "\n",
    "def random_distribution():\n",
    "  \"\"\"Generate a random column of probabilities.\"\"\"\n",
    "  b = np.random.uniform(0.0, 1.0, size=[1, vocabulary_size])\n",
    "  return b/np.sum(b, 1)[:,None]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "K8f67YXaDr4C"
   },
   "source": [
    "Simple LSTM Model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "cellView": "both",
    "colab": {
     "autoexec": {
      "startup": false,
      "wait_interval": 0
     }
    },
    "colab_type": "code",
    "collapsed": true,
    "id": "Q5rxZK6RDuGe"
   },
   "outputs": [],
   "source": [
    "num_nodes = 64\n",
    "\n",
    "graph = tf.Graph()\n",
    "with graph.as_default():\n",
    "  \n",
    "  # Parameters:\n",
    "  # Input gate: input, previous output, and bias.\n",
    "  ix = tf.Variable(tf.truncated_normal([vocabulary_size, num_nodes], -0.1, 0.1))\n",
    "  im = tf.Variable(tf.truncated_normal([num_nodes, num_nodes], -0.1, 0.1))\n",
    "  ib = tf.Variable(tf.zeros([1, num_nodes]))\n",
    "  # Forget gate: input, previous output, and bias.\n",
    "  fx = tf.Variable(tf.truncated_normal([vocabulary_size, num_nodes], -0.1, 0.1))\n",
    "  fm = tf.Variable(tf.truncated_normal([num_nodes, num_nodes], -0.1, 0.1))\n",
    "  fb = tf.Variable(tf.zeros([1, num_nodes]))\n",
    "  # Memory cell: input, state and bias.                             \n",
    "  cx = tf.Variable(tf.truncated_normal([vocabulary_size, num_nodes], -0.1, 0.1))\n",
    "  cm = tf.Variable(tf.truncated_normal([num_nodes, num_nodes], -0.1, 0.1))\n",
    "  cb = tf.Variable(tf.zeros([1, num_nodes]))\n",
    "  # Output gate: input, previous output, and bias.\n",
    "  ox = tf.Variable(tf.truncated_normal([vocabulary_size, num_nodes], -0.1, 0.1))\n",
    "  om = tf.Variable(tf.truncated_normal([num_nodes, num_nodes], -0.1, 0.1))\n",
    "  ob = tf.Variable(tf.zeros([1, num_nodes]))\n",
    "  # Variables saving state across unrollings.\n",
    "  saved_output = tf.Variable(tf.zeros([batch_size, num_nodes]), trainable=False)\n",
    "  saved_state = tf.Variable(tf.zeros([batch_size, num_nodes]), trainable=False)\n",
    "  # Classifier weights and biases.\n",
    "  w = tf.Variable(tf.truncated_normal([num_nodes, vocabulary_size], -0.1, 0.1))\n",
    "  b = tf.Variable(tf.zeros([vocabulary_size]))\n",
    "  \n",
    "  # Definition of the cell computation.\n",
    "  def lstm_cell(i, o, state):\n",
    "    \"\"\"Create a LSTM cell. See e.g.: http://arxiv.org/pdf/1402.1128v1.pdf\n",
    "    Note that in this formulation, we omit the various connections between the\n",
    "    previous state and the gates.\"\"\"\n",
    "    input_gate = tf.sigmoid(tf.matmul(i, ix) + tf.matmul(o, im) + ib)\n",
    "    forget_gate = tf.sigmoid(tf.matmul(i, fx) + tf.matmul(o, fm) + fb)\n",
    "    update = tf.matmul(i, cx) + tf.matmul(o, cm) + cb\n",
    "    state = forget_gate * state + input_gate * tf.tanh(update)\n",
    "    output_gate = tf.sigmoid(tf.matmul(i, ox) + tf.matmul(o, om) + ob)\n",
    "    return output_gate * tf.tanh(state), state\n",
    "\n",
    "  # Input data.\n",
    "  train_data = list()\n",
    "  for _ in range(num_unrollings + 1):\n",
    "    train_data.append(\n",
    "      tf.placeholder(tf.float32, shape=[batch_size,vocabulary_size]))\n",
    "  train_inputs = train_data[:num_unrollings]\n",
    "  train_labels = train_data[1:]  # labels are inputs shifted by one time step.\n",
    "\n",
    "  # Unrolled LSTM loop.\n",
    "  outputs = list()\n",
    "  output = saved_output\n",
    "  state = saved_state\n",
    "  for i in train_inputs:\n",
    "    output, state = lstm_cell(i, output, state)\n",
    "    outputs.append(output)\n",
    "\n",
    "  # State saving across unrollings.\n",
    "  with tf.control_dependencies([saved_output.assign(output),\n",
    "                                saved_state.assign(state)]):\n",
    "    # Classifier.\n",
    "    logits = tf.nn.xw_plus_b(tf.concat(0, outputs), w, b)\n",
    "    loss = tf.reduce_mean(\n",
    "      tf.nn.softmax_cross_entropy_with_logits(\n",
    "        logits, tf.concat(0, train_labels)))\n",
    "\n",
    "  # Optimizer.\n",
    "  global_step = tf.Variable(0)\n",
    "  learning_rate = tf.train.exponential_decay(\n",
    "    10.0, global_step, 5000, 0.1, staircase=True)\n",
    "  optimizer = tf.train.GradientDescentOptimizer(learning_rate)\n",
    "  gradients, v = zip(*optimizer.compute_gradients(loss))\n",
    "  gradients, _ = tf.clip_by_global_norm(gradients, 1.25)\n",
    "  optimizer = optimizer.apply_gradients(\n",
    "    zip(gradients, v), global_step=global_step)\n",
    "\n",
    "  # Predictions.\n",
    "  train_prediction = tf.nn.softmax(logits)\n",
    "  \n",
    "  # Sampling and validation eval: batch 1, no unrolling.\n",
    "  sample_input = tf.placeholder(tf.float32, shape=[1, vocabulary_size])\n",
    "  saved_sample_output = tf.Variable(tf.zeros([1, num_nodes]))\n",
    "  saved_sample_state = tf.Variable(tf.zeros([1, num_nodes]))\n",
    "  reset_sample_state = tf.group(\n",
    "    saved_sample_output.assign(tf.zeros([1, num_nodes])),\n",
    "    saved_sample_state.assign(tf.zeros([1, num_nodes])))\n",
    "  sample_output, sample_state = lstm_cell(\n",
    "    sample_input, saved_sample_output, saved_sample_state)\n",
    "  with tf.control_dependencies([saved_sample_output.assign(sample_output),\n",
    "                                saved_sample_state.assign(sample_state)]):\n",
    "    sample_prediction = tf.nn.softmax(tf.nn.xw_plus_b(sample_output, w, b))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "cellView": "both",
    "colab": {
     "autoexec": {
      "startup": false,
      "wait_interval": 0
     },
     "output_extras": [
      {
       "item_id": 41
      },
      {
       "item_id": 80
      },
      {
       "item_id": 126
      },
      {
       "item_id": 144
      }
     ]
    },
    "colab_type": "code",
    "collapsed": false,
    "executionInfo": {
     "elapsed": 199909,
     "status": "ok",
     "timestamp": 1445965877333,
     "user": {
      "color": "#1FA15D",
      "displayName": "Vincent Vanhoucke",
      "isAnonymous": false,
      "isMe": true,
      "permissionId": "05076109866853157986",
      "photoUrl": "//lh6.googleusercontent.com/-cCJa7dTDcgQ/AAAAAAAAAAI/AAAAAAAACgw/r2EZ_8oYer4/s50-c-k-no/photo.jpg",
      "sessionId": "6f6f07b359200c46",
      "userId": "102167687554210253930"
     },
     "user_tz": 420
    },
    "id": "RD9zQCZTEaEm",
    "outputId": "5e868466-2532-4545-ce35-b403cf5d9de6"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Initialized\n",
      "Average loss at step 0: 3.296481 learning rate: 10.000000\n",
      "Minibatch perplexity: 27.02\n",
      "================================================================================\n",
      "ysbunengslppeocc gagvepjeqaabtjaazieotn vnyiqvp a ie rwr  m gifvxgrvmrt lanxmytk\n",
      "w oemaiiwforms sxiemlr gnktx eekuauapvvmspaztiezgewieao eirr a kszns me zxgsozsw\n",
      "wsqeqzcxenft  utetqpxqc etnz  at  sb  jfol a   tlcaoeqs  amcjseanr  rna biavplpm\n",
      "bunhys s nh  mcqzstbotrbabi eblnns iqezcbknlevnhpoafbi xrie oze m r tu shosrttd \n",
      "e ote nfivhaamiphqxxragw re aontpagnhpwqxrx theoty ow qovmc x bam nza uerihctfie\n",
      "================================================================================\n",
      "Validation set perplexity: 19.96\n",
      "Average loss at step 100: 2.590483 learning rate: 10.000000\n",
      "Minibatch perplexity: 10.32\n",
      "Validation set perplexity: 10.51\n",
      "Average loss at step 200: 2.245892 learning rate: 10.000000\n",
      "Minibatch perplexity: 9.40\n",
      "Validation set perplexity: 8.99\n",
      "Average loss at step 300: 2.096432 learning rate: 10.000000\n",
      "Minibatch perplexity: 7.45\n",
      "Validation set perplexity: 7.64\n",
      "Average loss at step 400: 2.005932 learning rate: 10.000000\n",
      "Minibatch perplexity: 7.59\n",
      "Validation set perplexity: 7.47\n",
      "Average loss at step 500: 1.937780 learning rate: 10.000000\n",
      "Minibatch perplexity: 6.38\n",
      "Validation set perplexity: 7.11\n",
      "Average loss at step 600: 1.908713 learning rate: 10.000000\n",
      "Minibatch perplexity: 6.30\n",
      "Validation set perplexity: 6.86\n",
      "Average loss at step 700: 1.861192 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.56\n",
      "Validation set perplexity: 6.74\n",
      "Average loss at step 800: 1.820630 learning rate: 10.000000\n",
      "Minibatch perplexity: 6.03\n",
      "Validation set perplexity: 6.62\n",
      "Average loss at step 900: 1.828721 learning rate: 10.000000\n",
      "Minibatch perplexity: 7.15\n",
      "Validation set perplexity: 6.20\n",
      "Average loss at step 1000: 1.823988 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.85\n",
      "================================================================================\n",
      "gereng phs ciptiple two primed counts and the in mecielsticativen flors made at \n",
      "ling time two firvitial hivs disty is fiveding uct has fic zero yxpwarch of thei\n",
      "adiving sidyian of the inframes for a indiblatity highen plich bakioral is hine \n",
      "iver by thea conpraces tichar the one nine seven five one suib with the nations \n",
      "relenes tre mintist it sidlers coorches whill stitiatan grecture trans the benoy\n",
      "================================================================================\n",
      "Validation set perplexity: 6.04\n",
      "Average loss at step 1100: 1.777235 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.52\n",
      "Validation set perplexity: 6.01\n",
      "Average loss at step 1200: 1.750621 learning rate: 10.000000\n",
      "Minibatch perplexity: 6.13\n",
      "Validation set perplexity: 5.64\n",
      "Average loss at step 1300: 1.728072 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.69\n",
      "Validation set perplexity: 5.68\n",
      "Average loss at step 1400: 1.740794 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.81\n",
      "Validation set perplexity: 5.66\n",
      "Average loss at step 1500: 1.736469 learning rate: 10.000000\n",
      "Minibatch perplexity: 6.02\n",
      "Validation set perplexity: 5.47\n",
      "Average loss at step 1600: 1.743287 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.15\n",
      "Validation set perplexity: 5.38\n",
      "Average loss at step 1700: 1.709299 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.63\n",
      "Validation set perplexity: 5.39\n",
      "Average loss at step 1800: 1.675454 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.02\n",
      "Validation set perplexity: 5.34\n",
      "Average loss at step 1900: 1.648865 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.84\n",
      "Validation set perplexity: 5.26\n",
      "Average loss at step 2000: 1.693725 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.84\n",
      "================================================================================\n",
      "ing the potters indeass melans the darker stnut prejaling the guallu singagianis\n",
      "zer was cear in systillp entineash hemper one zero five gatentable tod whet thre\n",
      "retien one three to herevent it yohape plevites in there pardicus and densi and \n",
      "k of effeccially thet of morolips thimelabiner the fursited six three zero four \n",
      "gersings when cluse in his falted to banion from higs hevered deame overen in ui\n",
      "================================================================================\n",
      "Validation set perplexity: 5.18\n",
      "Average loss at step 2100: 1.686851 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.09\n",
      "Validation set perplexity: 4.95\n",
      "Average loss at step 2200: 1.680103 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.94\n",
      "Validation set perplexity: 5.05\n",
      "Average loss at step 2300: 1.640816 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.62\n",
      "Validation set perplexity: 4.87\n",
      "Average loss at step 2400: 1.659278 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.26\n",
      "Validation set perplexity: 4.84\n",
      "Average loss at step 2500: 1.676898 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.91\n",
      "Validation set perplexity: 4.63\n",
      "Average loss at step 2600: 1.651337 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.78\n",
      "Validation set perplexity: 4.79\n",
      "Average loss at step 2700: 1.651017 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.11\n",
      "Validation set perplexity: 4.59\n",
      "Average loss at step 2800: 1.649643 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.96\n",
      "Validation set perplexity: 4.54\n",
      "Average loss at step 2900: 1.647356 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.63\n",
      "Validation set perplexity: 4.55\n",
      "Average loss at step 3000: 1.649903 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.76\n",
      "================================================================================\n",
      "fication thas periov in yeras of aprrombay companishist des ty cirstic usring ye\n",
      "ularid of s inhe with that however one seven two praces tyondy and roal see air \n",
      "zed it old b withising one trial and contory writh howeven s become abreptions t\n",
      "bbec of have and staten scountating the greaser or a that becount with thy ho li\n",
      "plitic and augu playes to every revist more new transfartial clutht epoce awsist\n",
      "================================================================================\n",
      "Validation set perplexity: 4.71\n",
      "Average loss at step 3100: 1.631064 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.73\n",
      "Validation set perplexity: 4.65\n",
      "Average loss at step 3200: 1.646096 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.09\n",
      "Validation set perplexity: 4.61\n",
      "Average loss at step 3300: 1.640934 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.73\n",
      "Validation set perplexity: 4.46\n",
      "Average loss at step 3400: 1.669624 learning rate: 10.000000\n",
      "Minibatch perplexity: 6.08\n",
      "Validation set perplexity: 4.61\n",
      "Average loss at step 3500: 1.654194 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.64\n",
      "Validation set perplexity: 4.58\n",
      "Average loss at step 3600: 1.666263 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.99\n",
      "Validation set perplexity: 4.46\n",
      "Average loss at step 3700: 1.645868 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.56\n",
      "Validation set perplexity: 4.46\n",
      "Average loss at step 3800: 1.643696 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.88\n",
      "Validation set perplexity: 4.57\n",
      "Average loss at step 3900: 1.637875 learning rate: 10.000000\n",
      "Minibatch perplexity: 6.21\n",
      "Validation set perplexity: 4.59\n",
      "Average loss at step 4000: 1.643843 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.74\n",
      "================================================================================\n",
      "perments are meforchoun portable goedromor and madrics prasticlion would tendes \n",
      "in virlaitura of the oventry firsis intenled gid found and the hivican live six \n",
      "most it emportinuston at calawen laits the intean ase indation chactuge berog so\n",
      "king with peocual literial bejum so elfest compliar profaction have its are howe\n",
      "d in only albathed and as the tases noctien a progreated order of the loza were \n",
      "================================================================================\n",
      "Validation set perplexity: 4.54\n",
      "Average loss at step 4100: 1.630176 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.27\n",
      "Validation set perplexity: 4.71\n",
      "Average loss at step 4200: 1.633497 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.92\n",
      "Validation set perplexity: 4.46\n",
      "Average loss at step 4300: 1.614631 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.19\n",
      "Validation set perplexity: 4.52\n",
      "Average loss at step 4400: 1.608853 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.82\n",
      "Validation set perplexity: 4.29\n",
      "Average loss at step 4500: 1.610827 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.15\n",
      "Validation set perplexity: 4.43\n",
      "Average loss at step 4600: 1.611605 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.15\n",
      "Validation set perplexity: 4.44\n",
      "Average loss at step 4700: 1.622293 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.11\n",
      "Validation set perplexity: 4.42\n",
      "Average loss at step 4800: 1.622890 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.01\n",
      "Validation set perplexity: 4.48\n",
      "Average loss at step 4900: 1.631913 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.90\n",
      "Validation set perplexity: 4.56\n",
      "Average loss at step 5000: 1.606248 learning rate: 1.000000\n",
      "Minibatch perplexity: 5.40\n",
      "================================================================================\n",
      "ecally shan into mahk that the one foughtibn of spirp under of aid and more defi\n",
      "land but line hore which conventer of the committorchen bus king despire of two \n",
      "d the animan portaing on skesside over jastu this bay westorder isilor one six t\n",
      "que following not mihs often statistophis they sore a mbschishap foul made a pen\n",
      "ricted widst are is pathosed four the letter shied unetratid ade deficien sammeg\n",
      "================================================================================\n",
      "Validation set perplexity: 4.58\n",
      "Average loss at step 5100: 1.603605 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.99\n",
      "Validation set perplexity: 4.38\n",
      "Average loss at step 5200: 1.591440 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.68\n",
      "Validation set perplexity: 4.34\n",
      "Average loss at step 5300: 1.577866 learning rate: 1.000000\n",
      "Minibatch perplexity: 5.60\n",
      "Validation set perplexity: 4.31\n",
      "Average loss at step 5400: 1.575936 learning rate: 1.000000\n",
      "Minibatch perplexity: 5.34\n",
      "Validation set perplexity: 4.29\n",
      "Average loss at step 5500: 1.567288 learning rate: 1.000000\n",
      "Minibatch perplexity: 5.07\n",
      "Validation set perplexity: 4.27\n",
      "Average loss at step 5600: 1.573526 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.56\n",
      "Validation set perplexity: 4.26\n",
      "Average loss at step 5700: 1.563047 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.66\n",
      "Validation set perplexity: 4.26\n",
      "Average loss at step 5800: 1.579404 learning rate: 1.000000\n",
      "Minibatch perplexity: 5.01\n",
      "Validation set perplexity: 4.26\n",
      "Average loss at step 5900: 1.572157 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.48\n",
      "Validation set perplexity: 4.25\n",
      "Average loss at step 6000: 1.546779 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.89\n",
      "================================================================================\n",
      "vincy whiles kew had goftrated bayi for by alper people conswardinams is draphs \n",
      "zer one nine five six six as a cadi with is abrict of two five whone syrbuc the \n",
      "wer thenry newpereeches music s was on a tupk the nine him attembly a marmary mo\n",
      "can a penorty includes for  edections that the one six six five nine nine two on\n",
      "hull b englingle are a proposted hu had and one nine seven five liberidary and c\n",
      "================================================================================\n",
      "Validation set perplexity: 4.24\n",
      "Average loss at step 6100: 1.564810 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.16\n",
      "Validation set perplexity: 4.21\n",
      "Average loss at step 6200: 1.539832 learning rate: 1.000000\n",
      "Minibatch perplexity: 5.40\n",
      "Validation set perplexity: 4.22\n",
      "Average loss at step 6300: 1.543234 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.61\n",
      "Validation set perplexity: 4.19\n",
      "Average loss at step 6400: 1.539422 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.94\n",
      "Validation set perplexity: 4.20\n",
      "Average loss at step 6500: 1.553795 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.40\n",
      "Validation set perplexity: 4.19\n",
      "Average loss at step 6600: 1.592802 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.74\n",
      "Validation set perplexity: 4.21\n",
      "Average loss at step 6700: 1.575369 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.41\n",
      "Validation set perplexity: 4.20\n",
      "Average loss at step 6800: 1.601404 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.91\n",
      "Validation set perplexity: 4.19\n",
      "Average loss at step 6900: 1.578345 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.60\n",
      "Validation set perplexity: 4.22\n",
      "Average loss at step 7000: 1.572104 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.82\n",
      "================================================================================\n",
      "quishes and as an ourcifics be azardaing incoporsion is shows lought that slow b\n",
      "x msd cut yight chradgen people eduslieds and got washards proless cop profess i\n",
      "warding practonism of the onles docyx locking the first part the volomownada rea\n",
      "q him hand hey have american had after about purriess between margos remarnity w\n",
      "h programmatius by the bortination astronary and samitudy is impreversings makpa\n",
      "================================================================================\n",
      "Validation set perplexity: 4.19\n"
     ]
    }
   ],
   "source": [
    "num_steps = 7001\n",
    "summary_frequency = 100\n",
    "\n",
    "with tf.Session(graph=graph) as session:\n",
    "  tf.initialize_all_variables().run()\n",
    "  print('Initialized')\n",
    "  mean_loss = 0\n",
    "  for step in range(num_steps):\n",
    "    batches = train_batches.next()\n",
    "    feed_dict = dict()\n",
    "    for i in range(num_unrollings + 1):\n",
    "      feed_dict[train_data[i]] = batches[i]\n",
    "    _, l, predictions, lr = session.run(\n",
    "      [optimizer, loss, train_prediction, learning_rate], feed_dict=feed_dict)\n",
    "    mean_loss += l\n",
    "    if step % summary_frequency == 0:\n",
    "      if step > 0:\n",
    "        mean_loss = mean_loss / summary_frequency\n",
    "      # The mean loss is an estimate of the loss over the last few batches.\n",
    "      print(\n",
    "        'Average loss at step %d: %f learning rate: %f' % (step, mean_loss, lr))\n",
    "      mean_loss = 0\n",
    "      labels = np.concatenate(list(batches)[1:])\n",
    "      print('Minibatch perplexity: %.2f' % float(\n",
    "        np.exp(logprob(predictions, labels))))\n",
    "      if step % (summary_frequency * 10) == 0:\n",
    "        # Generate some samples.\n",
    "        print('=' * 80)\n",
    "        for _ in range(5):\n",
    "          feed = sample(random_distribution())\n",
    "          sentence = characters(feed)[0]\n",
    "          reset_sample_state.run()\n",
    "          for _ in range(79):\n",
    "            prediction = sample_prediction.eval({sample_input: feed})\n",
    "            feed = sample(prediction)\n",
    "            sentence += characters(feed)[0]\n",
    "          print(sentence)\n",
    "        print('=' * 80)\n",
    "      # Measure validation set perplexity.\n",
    "      reset_sample_state.run()\n",
    "      valid_logprob = 0\n",
    "      for _ in range(valid_size):\n",
    "        b = valid_batches.next()\n",
    "        predictions = sample_prediction.eval({sample_input: b[0]})\n",
    "        valid_logprob = valid_logprob + logprob(predictions, b[1])\n",
    "      print('Validation set perplexity: %.2f' % float(np.exp(\n",
    "        valid_logprob / valid_size)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "pl4vtmFfa5nn"
   },
   "source": [
    "---\n",
    "Problem 1\n",
    "---------\n",
    "\n",
    "You might have noticed that the definition of the LSTM cell involves 4 matrix multiplications with the input, and 4 matrix multiplications with the output. Simplify the expression by using a single matrix multiply for each, and variables that are 4 times larger.\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "num_nodes = 64\n",
    "\n",
    "graph = tf.Graph()\n",
    "with graph.as_default():\n",
    "  \n",
    "  # Parameters:\n",
    "  # Input gate: input, previous output, and bias.\n",
    "  ix = tf.Variable(tf.truncated_normal([vocabulary_size, num_nodes], -0.1, 0.1))\n",
    "  im = tf.Variable(tf.truncated_normal([num_nodes, num_nodes], -0.1, 0.1))\n",
    "  ib = tf.Variable(tf.zeros([1, num_nodes]))\n",
    "  # Forget gate: input, previous output, and bias.\n",
    "  fx = tf.Variable(tf.truncated_normal([vocabulary_size, num_nodes], -0.1, 0.1))\n",
    "  fm = tf.Variable(tf.truncated_normal([num_nodes, num_nodes], -0.1, 0.1))\n",
    "  fb = tf.Variable(tf.zeros([1, num_nodes]))\n",
    "  # Memory cell: input, state and bias.                             \n",
    "  cx = tf.Variable(tf.truncated_normal([vocabulary_size, num_nodes], -0.1, 0.1))\n",
    "  cm = tf.Variable(tf.truncated_normal([num_nodes, num_nodes], -0.1, 0.1))\n",
    "  cb = tf.Variable(tf.zeros([1, num_nodes]))\n",
    "  # Output gate: input, previous output, and bias.\n",
    "  ox = tf.Variable(tf.truncated_normal([vocabulary_size, num_nodes], -0.1, 0.1))\n",
    "  om = tf.Variable(tf.truncated_normal([num_nodes, num_nodes], -0.1, 0.1))\n",
    "  ob = tf.Variable(tf.zeros([1, num_nodes]))\n",
    "  # Concatenate parameters  \n",
    "  sx = tf.concat(1, [ix, fx, cx, ox])\n",
    "  sm = tf.concat(1, [im, fm, cm, om])\n",
    "  sb = tf.concat(1, [ib, fb, cb, ob])\n",
    "  # Variables saving state across unrollings.\n",
    "  saved_output = tf.Variable(tf.zeros([batch_size, num_nodes]), trainable=False)\n",
    "  saved_state = tf.Variable(tf.zeros([batch_size, num_nodes]), trainable=False)\n",
    "  # Classifier weights and biases.\n",
    "  w = tf.Variable(tf.truncated_normal([num_nodes, vocabulary_size], -0.1, 0.1))\n",
    "  b = tf.Variable(tf.zeros([vocabulary_size]))\n",
    "  \n",
    "  # Definition of the cell computation.\n",
    "  def lstm_cell(i, o, state):\n",
    "    \"\"\"Create a LSTM cell. See e.g.: http://arxiv.org/pdf/1402.1128v1.pdf\n",
    "    Note that in this formulation, we omit the various connections between the\n",
    "    previous state and the gates.\"\"\"\n",
    "    smatmul = tf.matmul(i, sx) + tf.matmul(o, sm) + sb\n",
    "    smatmul_input, smatmul_forget, update, smatmul_output = tf.split(1, 4, smatmul)\n",
    "    input_gate = tf.sigmoid(smatmul_input)\n",
    "    forget_gate = tf.sigmoid(smatmul_forget)\n",
    "    output_gate = tf.sigmoid(smatmul_output)\n",
    "    #input_gate = tf.sigmoid(tf.matmul(i, ix) + tf.matmul(o, im) + ib)\n",
    "    #forget_gate = tf.sigmoid(tf.matmul(i, fx) + tf.matmul(o, fm) + fb)\n",
    "    #update = tf.matmul(i, cx) + tf.matmul(o, cm) + cb\n",
    "    state = forget_gate * state + input_gate * tf.tanh(update)\n",
    "    #output_gate = tf.sigmoid(tf.matmul(i, ox) + tf.matmul(o, om) + ob)\n",
    "    return output_gate * tf.tanh(state), state\n",
    "\n",
    "  # Input data.\n",
    "  train_data = list()\n",
    "  for _ in range(num_unrollings + 1):\n",
    "    train_data.append(\n",
    "      tf.placeholder(tf.float32, shape=[batch_size,vocabulary_size]))\n",
    "  train_inputs = train_data[:num_unrollings]\n",
    "  train_labels = train_data[1:]  # labels are inputs shifted by one time step.\n",
    "\n",
    "  # Unrolled LSTM loop.\n",
    "  outputs = list()\n",
    "  output = saved_output\n",
    "  state = saved_state\n",
    "  for i in train_inputs:\n",
    "    output, state = lstm_cell(i, output, state)\n",
    "    outputs.append(output)\n",
    "\n",
    "  # State saving across unrollings.\n",
    "  with tf.control_dependencies([saved_output.assign(output),\n",
    "                                saved_state.assign(state)]):\n",
    "    # Classifier.\n",
    "    logits = tf.nn.xw_plus_b(tf.concat(0, outputs), w, b)\n",
    "    loss = tf.reduce_mean(\n",
    "      tf.nn.softmax_cross_entropy_with_logits(\n",
    "        logits, tf.concat(0, train_labels)))\n",
    "\n",
    "  # Optimizer.\n",
    "  global_step = tf.Variable(0)\n",
    "  learning_rate = tf.train.exponential_decay(\n",
    "    10.0, global_step, 5000, 0.1, staircase=True)\n",
    "  optimizer = tf.train.GradientDescentOptimizer(learning_rate)\n",
    "  gradients, v = zip(*optimizer.compute_gradients(loss))\n",
    "  gradients, _ = tf.clip_by_global_norm(gradients, 1.25)\n",
    "  optimizer = optimizer.apply_gradients(\n",
    "    zip(gradients, v), global_step=global_step)\n",
    "\n",
    "  # Predictions.\n",
    "  train_prediction = tf.nn.softmax(logits)\n",
    "  \n",
    "  # Sampling and validation eval: batch 1, no unrolling.\n",
    "  sample_input = tf.placeholder(tf.float32, shape=[1, vocabulary_size])\n",
    "  saved_sample_output = tf.Variable(tf.zeros([1, num_nodes]))\n",
    "  saved_sample_state = tf.Variable(tf.zeros([1, num_nodes]))\n",
    "  reset_sample_state = tf.group(\n",
    "    saved_sample_output.assign(tf.zeros([1, num_nodes])),\n",
    "    saved_sample_state.assign(tf.zeros([1, num_nodes])))\n",
    "  sample_output, sample_state = lstm_cell(\n",
    "    sample_input, saved_sample_output, saved_sample_state)\n",
    "  with tf.control_dependencies([saved_sample_output.assign(sample_output),\n",
    "                                saved_sample_state.assign(sample_state)]):\n",
    "    sample_prediction = tf.nn.softmax(tf.nn.xw_plus_b(sample_output, w, b))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Initialized\n",
      "Average loss at step 0: 3.297115 learning rate: 10.000000\n",
      "Minibatch perplexity: 27.03\n",
      "================================================================================\n",
      "yafqiklmzuicdll tyqzeqmblto juwh knmeuy  jt et  loqezts kave qleevefbsegririkidu\n",
      "ah   xo c ufe dre y ai knq rc lf ugleeninvedxkhfkzo tyfheeeczltkso e ooedncbepgk\n",
      "wcpeal bscdbpaeeh de ixgequ hyeiabbxvseyeyezkhlxiisemcnqahfxcoprtnyvir oceeaeyv \n",
      "fvaesz tbat ssokiqn  xnnpoz it isisgxdzqjni teyieangbnrep ldsjg ghxufx  gxe mebv\n",
      "khoseotwxo eov hdcewtq zj olqahxfdfnld e mtnh  qhaqfvsfiggebbtoen miowodtnihg th\n",
      "================================================================================\n",
      "Validation set perplexity: 20.20\n",
      "Average loss at step 100: 2.596444 learning rate: 10.000000\n",
      "Minibatch perplexity: 10.63\n",
      "Validation set perplexity: 11.01\n",
      "Average loss at step 200: 2.256601 learning rate: 10.000000\n",
      "Minibatch perplexity: 9.29\n",
      "Validation set perplexity: 9.01\n",
      "Average loss at step 300: 2.091136 learning rate: 10.000000\n",
      "Minibatch perplexity: 7.43\n",
      "Validation set perplexity: 8.21\n",
      "Average loss at step 400: 2.033858 learning rate: 10.000000\n",
      "Minibatch perplexity: 6.94\n",
      "Validation set perplexity: 7.70\n",
      "Average loss at step 500: 1.979009 learning rate: 10.000000\n",
      "Minibatch perplexity: 6.80\n",
      "Validation set perplexity: 7.15\n",
      "Average loss at step 600: 1.891969 learning rate: 10.000000\n",
      "Minibatch perplexity: 6.55\n",
      "Validation set perplexity: 6.85\n",
      "Average loss at step 700: 1.865489 learning rate: 10.000000\n",
      "Minibatch perplexity: 7.15\n",
      "Validation set perplexity: 6.51\n",
      "Average loss at step 800: 1.863540 learning rate: 10.000000\n",
      "Minibatch perplexity: 6.81\n",
      "Validation set perplexity: 6.43\n",
      "Average loss at step 900: 1.840830 learning rate: 10.000000\n",
      "Minibatch perplexity: 6.57\n",
      "Validation set perplexity: 6.26\n",
      "Average loss at step 1000: 1.834599 learning rate: 10.000000\n",
      "Minibatch perplexity: 6.26\n",
      "================================================================================\n",
      "s dnocigat dwilvac liltanimally  as arter progugatary of in are one pateal chole\n",
      " aikinger not man jochial stalt geot caire these carrehts one one five even zero\n",
      "ce prenife out fill signary wlloson num nine in gridth b lith and home over one \n",
      "it caletical a proqued to to palbodchd thut uinsinelord as eahn but dadachser th\n",
      "z a prochral quring retiric turears wresh it alto six virrows not prolept bustak\n",
      "================================================================================\n",
      "Validation set perplexity: 6.04\n",
      "Average loss at step 1100: 1.792052 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.63\n",
      "Validation set perplexity: 6.01\n",
      "Average loss at step 1200: 1.767267 learning rate: 10.000000\n",
      "Minibatch perplexity: 6.20\n",
      "Validation set perplexity: 5.95\n",
      "Average loss at step 1300: 1.756417 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.67\n",
      "Validation set perplexity: 5.79\n",
      "Average loss at step 1400: 1.759507 learning rate: 10.000000\n",
      "Minibatch perplexity: 6.03\n",
      "Validation set perplexity: 5.66\n",
      "Average loss at step 1500: 1.740990 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.66\n",
      "Validation set perplexity: 5.40\n",
      "Average loss at step 1600: 1.728305 learning rate: 10.000000\n",
      "Minibatch perplexity: 6.28\n",
      "Validation set perplexity: 5.59\n",
      "Average loss at step 1700: 1.711466 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.89\n",
      "Validation set perplexity: 5.42\n",
      "Average loss at step 1800: 1.685569 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.52\n",
      "Validation set perplexity: 5.37\n",
      "Average loss at step 1900: 1.696628 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.67\n",
      "Validation set perplexity: 5.23\n",
      "Average loss at step 2000: 1.677520 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.62\n",
      "================================================================================\n",
      "linopogte womeffice iss nota inclutising gudaticalibed s bubloupa and tent of on\n",
      "kin to itreless is user wornnoliteci file keach shobiopors repomits in fromestin\n",
      "dinatic placiles bot diost froulh one nine interness one nines one five seven ni\n",
      "mouth was whele coen with act lerger there altixe ive imbolison as histoxami com\n",
      "renly in trle its orevanional nation from atticus nation assomentaliant of letti\n",
      "================================================================================\n",
      "Validation set perplexity: 5.35\n",
      "Average loss at step 2100: 1.686659 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.71\n",
      "Validation set perplexity: 5.18\n",
      "Average loss at step 2200: 1.699840 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.03\n",
      "Validation set perplexity: 5.04\n",
      "Average loss at step 2300: 1.701436 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.61\n",
      "Validation set perplexity: 5.04\n",
      "Average loss at step 2400: 1.678378 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.78\n",
      "Validation set perplexity: 4.98\n",
      "Average loss at step 2500: 1.690642 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.21\n",
      "Validation set perplexity: 4.99\n",
      "Average loss at step 2600: 1.667896 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.33\n",
      "Validation set perplexity: 4.96\n",
      "Average loss at step 2700: 1.678243 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.61\n",
      "Validation set perplexity: 4.99\n",
      "Average loss at step 2800: 1.674157 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.39\n",
      "Validation set perplexity: 5.13\n",
      "Average loss at step 2900: 1.672686 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.26\n",
      "Validation set perplexity: 5.01\n",
      "Average loss at step 3000: 1.683866 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.16\n",
      "================================================================================\n",
      "h ditished is akeriad inspolusin anirace vinzubctivilics procists populanca yaad\n",
      "yons frequentyes augheved funtorn the distant in the loseros of sillow beganilat\n",
      "wert prenfs kand where tograte subam advenction islee is permany educional pario\n",
      "ungany severative the among that to sevent the sleasings units clanded agries st\n",
      "ppholed be refective that nims meike frenchona textliff forcesendor in pre natem\n",
      "================================================================================\n",
      "Validation set perplexity: 4.93\n",
      "Average loss at step 3100: 1.647754 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.99\n",
      "Validation set perplexity: 4.98\n",
      "Average loss at step 3200: 1.629427 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.52\n",
      "Validation set perplexity: 4.97\n",
      "Average loss at step 3300: 1.643257 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.44\n",
      "Validation set perplexity: 4.81\n",
      "Average loss at step 3400: 1.623889 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.15\n",
      "Validation set perplexity: 4.96\n",
      "Average loss at step 3500: 1.669602 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.69\n",
      "Validation set perplexity: 4.96\n",
      "Average loss at step 3600: 1.647169 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.29\n",
      "Validation set perplexity: 4.75\n",
      "Average loss at step 3700: 1.650060 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.65\n",
      "Validation set perplexity: 4.91\n",
      "Average loss at step 3800: 1.653223 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.78\n",
      "Validation set perplexity: 4.79\n",
      "Average loss at step 3900: 1.647023 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.11\n",
      "Validation set perplexity: 4.91\n",
      "Average loss at step 4000: 1.636907 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.31\n",
      "================================================================================\n",
      "kipage lazin russive acis fative ufic doe one six two s han all seritace one hom\n",
      "tional argent that it ik transparaed somee thing many course for the archerages \n",
      "ight with the new frisone a famoustal for cles ambio conseng one five zero to a \n",
      "is ganadua dianchumamar irams trae recognite and wigh in award folloms ra swarth\n",
      "fucting the famout of ecesparahs thus both statide serving by pociluted candarg \n",
      "================================================================================\n",
      "Validation set perplexity: 4.78\n",
      "Average loss at step 4100: 1.616329 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.73\n",
      "Validation set perplexity: 4.54\n",
      "Average loss at step 4200: 1.609911 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.27\n",
      "Validation set perplexity: 4.73\n",
      "Average loss at step 4300: 1.617847 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.09\n",
      "Validation set perplexity: 4.75\n",
      "Average loss at step 4400: 1.605607 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.89\n",
      "Validation set perplexity: 4.71\n",
      "Average loss at step 4500: 1.637116 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.23\n",
      "Validation set perplexity: 4.77\n",
      "Average loss at step 4600: 1.622026 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.78\n",
      "Validation set perplexity: 4.70\n",
      "Average loss at step 4700: 1.616178 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.87\n",
      "Validation set perplexity: 4.72\n",
      "Average loss at step 4800: 1.609194 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.91\n",
      "Validation set perplexity: 4.70\n",
      "Average loss at step 4900: 1.616821 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.67\n",
      "Validation set perplexity: 4.51\n",
      "Average loss at step 5000: 1.609718 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.52\n",
      "================================================================================\n",
      "an resuld of the own they cosm the one nine eight iria such are prited discuting\n",
      "jests frbm with the zaudjist waters are art and sate denariol and belend on give\n",
      "one us speeshremen contenturus durious s innucting as the emb distinne is ervint\n",
      "ed on the develoost of anstalored great which tigan statari in host rotem in tra\n",
      "f two onising two zero zero six flegj one nine nine six duepmor one six smesp to\n",
      "================================================================================\n",
      "Validation set perplexity: 4.73\n",
      "Average loss at step 5100: 1.588180 learning rate: 1.000000\n",
      "Minibatch perplexity: 5.00\n",
      "Validation set perplexity: 4.57\n",
      "Average loss at step 5200: 1.589828 learning rate: 1.000000\n",
      "Minibatch perplexity: 5.13\n",
      "Validation set perplexity: 4.54\n",
      "Average loss at step 5300: 1.588764 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.47\n",
      "Validation set perplexity: 4.54\n",
      "Average loss at step 5400: 1.586965 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.81\n",
      "Validation set perplexity: 4.52\n",
      "Average loss at step 5500: 1.583831 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.63\n",
      "Validation set perplexity: 4.49\n",
      "Average loss at step 5600: 1.554910 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.73\n",
      "Validation set perplexity: 4.46\n",
      "Average loss at step 5700: 1.572744 learning rate: 1.000000\n",
      "Minibatch perplexity: 5.59\n",
      "Validation set perplexity: 4.45\n",
      "Average loss at step 5800: 1.593644 learning rate: 1.000000\n",
      "Minibatch perplexity: 5.05\n",
      "Validation set perplexity: 4.47\n",
      "Average loss at step 5900: 1.578733 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.78\n",
      "Validation set perplexity: 4.48\n",
      "Average loss at step 6000: 1.577548 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.55\n",
      "================================================================================\n",
      "frimiarsed readinall distordingah labon janes to the milatives suffichanged s ba\n",
      "ry henre triev files in the explivarity of the orpan to the urchicial pifsion co\n",
      " puble enown despror has finds whethest of air me one amproron plants avamon to \n",
      "an lared paranneveder pubrancy awerates and in hungudge in the yous to s of the \n",
      "s huldia to a canal in the distance gamenal da set occologen be workfication bas\n",
      "================================================================================\n",
      "Validation set perplexity: 4.45\n",
      "Average loss at step 6100: 1.574233 learning rate: 1.000000\n",
      "Minibatch perplexity: 5.21\n",
      "Validation set perplexity: 4.50\n",
      "Average loss at step 6200: 1.585383 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.81\n",
      "Validation set perplexity: 4.53\n",
      "Average loss at step 6300: 1.587027 learning rate: 1.000000\n",
      "Minibatch perplexity: 5.95\n",
      "Validation set perplexity: 4.53\n",
      "Average loss at step 6400: 1.566589 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.85\n",
      "Validation set perplexity: 4.52\n",
      "Average loss at step 6500: 1.554185 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.61\n",
      "Validation set perplexity: 4.53\n",
      "Average loss at step 6600: 1.600431 learning rate: 1.000000\n",
      "Minibatch perplexity: 5.99\n",
      "Validation set perplexity: 4.51\n",
      "Average loss at step 6700: 1.564465 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.71\n",
      "Validation set perplexity: 4.50\n",
      "Average loss at step 6800: 1.574971 learning rate: 1.000000\n",
      "Minibatch perplexity: 5.34\n",
      "Validation set perplexity: 4.55\n",
      "Average loss at step 6900: 1.566624 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.85\n",
      "Validation set perplexity: 4.48\n",
      "Average loss at step 7000: 1.584673 learning rate: 1.000000\n",
      "Minibatch perplexity: 5.10\n",
      "================================================================================\n",
      "phout one nine alone nine from he he burb with the sropes ver united out then we\n",
      "gubinising albitt druma paptional good known of speciprenublations aga widhabans\n",
      "ques now romairs that have e cipatical necelogated fighte pain bolly mag named s\n",
      "bolsce butnines to the four zero five three one nine nine estomogopeoge instire \n",
      "dure on the origin lingla in the hardogn to the was imborly univeory so shunt am\n",
      "================================================================================\n",
      "Validation set perplexity: 4.50\n"
     ]
    }
   ],
   "source": [
    "num_steps = 7001\n",
    "summary_frequency = 100\n",
    "\n",
    "with tf.Session(graph=graph) as session:\n",
    "  tf.initialize_all_variables().run()\n",
    "  print('Initialized')\n",
    "  mean_loss = 0\n",
    "  for step in range(num_steps):\n",
    "    batches = train_batches.next()\n",
    "    feed_dict = dict()\n",
    "    for i in range(num_unrollings + 1):\n",
    "      feed_dict[train_data[i]] = batches[i]\n",
    "    _, l, predictions, lr = session.run(\n",
    "      [optimizer, loss, train_prediction, learning_rate], feed_dict=feed_dict)\n",
    "    mean_loss += l\n",
    "    if step % summary_frequency == 0:\n",
    "      if step > 0:\n",
    "        mean_loss = mean_loss / summary_frequency\n",
    "      # The mean loss is an estimate of the loss over the last few batches.\n",
    "      print(\n",
    "        'Average loss at step %d: %f learning rate: %f' % (step, mean_loss, lr))\n",
    "      mean_loss = 0\n",
    "      labels = np.concatenate(list(batches)[1:])\n",
    "      print('Minibatch perplexity: %.2f' % float(\n",
    "        np.exp(logprob(predictions, labels))))\n",
    "      if step % (summary_frequency * 10) == 0:\n",
    "        # Generate some samples.\n",
    "        print('=' * 80)\n",
    "        for _ in range(5):\n",
    "          feed = sample(random_distribution())\n",
    "          sentence = characters(feed)[0]\n",
    "          reset_sample_state.run()\n",
    "          for _ in range(79):\n",
    "            prediction = sample_prediction.eval({sample_input: feed})\n",
    "            feed = sample(prediction)\n",
    "            sentence += characters(feed)[0]\n",
    "          print(sentence)\n",
    "        print('=' * 80)\n",
    "      # Measure validation set perplexity.\n",
    "      reset_sample_state.run()\n",
    "      valid_logprob = 0\n",
    "      for _ in range(valid_size):\n",
    "        b = valid_batches.next()\n",
    "        predictions = sample_prediction.eval({sample_input: b[0]})\n",
    "        valid_logprob = valid_logprob + logprob(predictions, b[1])\n",
    "      print('Validation set perplexity: %.2f' % float(np.exp(\n",
    "        valid_logprob / valid_size)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "4eErTCTybtph"
   },
   "source": [
    "---\n",
    "Problem 2\n",
    "---------\n",
    "\n",
    "We want to train a LSTM over bigrams, that is pairs of consecutive characters like 'ab' instead of single characters like 'a'. Since the number of possible bigrams is large, feeding them directly to the LSTM using 1-hot encodings will lead to a very sparse representation that is very wasteful computationally.\n",
    "\n",
    "a- Introduce an embedding lookup on the inputs, and feed the embeddings to the LSTM cell instead of the inputs themselves.\n",
    "\n",
    "b- Write a bigram-based LSTM, modeled on the character LSTM above.\n",
    "\n",
    "c- Introduce Dropout. For best practices on how to use Dropout in LSTMs, refer to this [article](http://arxiv.org/abs/1409.2329).\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's first adapt the LSTM for a single character input with embeddings. The ``feed_dict`` is unchanged, the embeddings are looked up from the inputs. Note that the output is an array probability for the possible characters, not an embedding."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "embedding_size = 128 # Dimension of the embedding vector.\n",
    "num_nodes = 64\n",
    "\n",
    "graph = tf.Graph()\n",
    "with graph.as_default():\n",
    "  \n",
    "  # Parameters:\n",
    "  vocabulary_embeddings = tf.Variable(\n",
    "    tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))\n",
    "  # Input gate: input, previous output, and bias.\n",
    "  ix = tf.Variable(tf.truncated_normal([embedding_size, num_nodes], -0.1, 0.1))\n",
    "  im = tf.Variable(tf.truncated_normal([num_nodes, num_nodes], -0.1, 0.1))\n",
    "  ib = tf.Variable(tf.zeros([1, num_nodes]))\n",
    "  # Forget gate: input, previous output, and bias.\n",
    "  fx = tf.Variable(tf.truncated_normal([embedding_size, num_nodes], -0.1, 0.1))\n",
    "  fm = tf.Variable(tf.truncated_normal([num_nodes, num_nodes], -0.1, 0.1))\n",
    "  fb = tf.Variable(tf.zeros([1, num_nodes]))\n",
    "  # Memory cell: input, state and bias.                             \n",
    "  cx = tf.Variable(tf.truncated_normal([embedding_size, num_nodes], -0.1, 0.1))\n",
    "  cm = tf.Variable(tf.truncated_normal([num_nodes, num_nodes], -0.1, 0.1))\n",
    "  cb = tf.Variable(tf.zeros([1, num_nodes]))\n",
    "  # Output gate: input, previous output, and bias.\n",
    "  ox = tf.Variable(tf.truncated_normal([embedding_size, num_nodes], -0.1, 0.1))\n",
    "  om = tf.Variable(tf.truncated_normal([num_nodes, num_nodes], -0.1, 0.1))\n",
    "  ob = tf.Variable(tf.zeros([1, num_nodes]))\n",
    "  # Variables saving state across unrollings.\n",
    "  saved_output = tf.Variable(tf.zeros([batch_size, num_nodes]), trainable=False)\n",
    "  saved_state = tf.Variable(tf.zeros([batch_size, num_nodes]), trainable=False)\n",
    "  # Classifier weights and biases.\n",
    "  w = tf.Variable(tf.truncated_normal([num_nodes, vocabulary_size], -0.1, 0.1))\n",
    "  b = tf.Variable(tf.zeros([vocabulary_size]))\n",
    "  \n",
    "  # Definition of the cell computation.\n",
    "  def lstm_cell(i, o, state):\n",
    "    \"\"\"Create a LSTM cell. See e.g.: http://arxiv.org/pdf/1402.1128v1.pdf\n",
    "    Note that in this formulation, we omit the various connections between the\n",
    "    previous state and the gates.\"\"\"\n",
    "    input_gate = tf.sigmoid(tf.matmul(i, ix) + tf.matmul(o, im) + ib)\n",
    "    forget_gate = tf.sigmoid(tf.matmul(i, fx) + tf.matmul(o, fm) + fb)\n",
    "    update = tf.matmul(i, cx) + tf.matmul(o, cm) + cb\n",
    "    state = forget_gate * state + input_gate * tf.tanh(update)\n",
    "    output_gate = tf.sigmoid(tf.matmul(i, ox) + tf.matmul(o, om) + ob)\n",
    "    return output_gate * tf.tanh(state), state\n",
    "\n",
    "  # Input data.\n",
    "  train_data = list()\n",
    "  for _ in range(num_unrollings + 1):\n",
    "    train_data.append(\n",
    "      tf.placeholder(tf.float32, shape=[batch_size,vocabulary_size]))\n",
    "  train_inputs = train_data[:num_unrollings]\n",
    "  train_labels = train_data[1:]  # labels are inputs shifted by one time step.\n",
    "\n",
    "  # Unrolled LSTM loop.\n",
    "  outputs = list()\n",
    "  output = saved_output\n",
    "  state = saved_state\n",
    "  for i in train_inputs:\n",
    "    i_embed = tf.nn.embedding_lookup(vocabulary_embeddings, tf.argmax(i, dimension=1))\n",
    "    output, state = lstm_cell(i_embed, output, state)\n",
    "    outputs.append(output)\n",
    "\n",
    "  # State saving across unrollings.\n",
    "  with tf.control_dependencies([saved_output.assign(output),\n",
    "                                saved_state.assign(state)]):\n",
    "    # Classifier.\n",
    "    logits = tf.nn.xw_plus_b(tf.concat(0, outputs), w, b)\n",
    "    loss = tf.reduce_mean(\n",
    "      tf.nn.softmax_cross_entropy_with_logits(\n",
    "        logits, tf.concat(0, train_labels)))\n",
    "\n",
    "  # Optimizer.\n",
    "  global_step = tf.Variable(0)\n",
    "  learning_rate = tf.train.exponential_decay(\n",
    "    10.0, global_step, 5000, 0.1, staircase=True)\n",
    "  optimizer = tf.train.GradientDescentOptimizer(learning_rate)\n",
    "  gradients, v = zip(*optimizer.compute_gradients(loss))\n",
    "  gradients, _ = tf.clip_by_global_norm(gradients, 1.25)\n",
    "  optimizer = optimizer.apply_gradients(\n",
    "    zip(gradients, v), global_step=global_step)\n",
    "\n",
    "  # Predictions.\n",
    "  train_prediction = tf.nn.softmax(logits)\n",
    "  \n",
    "  # Sampling and validation eval: batch 1, no unrolling.\n",
    "  sample_input = tf.placeholder(tf.float32, shape=[1, vocabulary_size])\n",
    "  sample_input_embedding = tf.nn.embedding_lookup(vocabulary_embeddings, tf.argmax(sample_input, dimension=1))\n",
    "  saved_sample_output = tf.Variable(tf.zeros([1, num_nodes]))\n",
    "  saved_sample_state = tf.Variable(tf.zeros([1, num_nodes]))\n",
    "  reset_sample_state = tf.group(\n",
    "    saved_sample_output.assign(tf.zeros([1, num_nodes])),\n",
    "    saved_sample_state.assign(tf.zeros([1, num_nodes])))\n",
    "  sample_output, sample_state = lstm_cell(\n",
    "    sample_input_embedding, saved_sample_output, saved_sample_state)\n",
    "  with tf.control_dependencies([saved_sample_output.assign(sample_output),\n",
    "                                saved_sample_state.assign(sample_state)]):\n",
    "    sample_prediction = tf.nn.softmax(tf.nn.xw_plus_b(sample_output, w, b))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Initialized\n",
      "Average loss at step 0: 3.298660 learning rate: 10.000000\n",
      "Minibatch perplexity: 27.08\n",
      "================================================================================\n",
      "qnh vrumdgy alikrxhfi sungvt jebthempdekvu aavrrqm kl ntlvpjwlcyjiybizt ashgw t \n",
      "uz em krrdw  pje segode uffvzeendn e eosaltpkrisuhxvlykx xaofjstdh milcxnoksgoae\n",
      "w  cxhylratk v  pe o grftepc tey meefamtrmpstkn jbibfttht of gcgltje nccxlenegag\n",
      "wonlqmdc lpetrfw  je ofdrq xhnhz n  les eryttqjqjdt sfye l geonuckifmvoeluikswar\n",
      "d  qoyrps  dsh tbs phfdponfketsnmtnvebyfkaoftfntctvxtymr wokates byxcubadc fhaaj\n",
      "================================================================================\n",
      "Validation set perplexity: 18.92\n",
      "Average loss at step 100: 2.281275 learning rate: 10.000000\n",
      "Minibatch perplexity: 8.62\n",
      "Validation set perplexity: 8.51\n",
      "Average loss at step 200: 2.023276 learning rate: 10.000000\n",
      "Minibatch perplexity: 6.84\n",
      "Validation set perplexity: 7.78\n",
      "Average loss at step 300: 1.923201 learning rate: 10.000000\n",
      "Minibatch perplexity: 6.25\n",
      "Validation set perplexity: 6.69\n",
      "Average loss at step 400: 1.866552 learning rate: 10.000000\n",
      "Minibatch perplexity: 6.35\n",
      "Validation set perplexity: 6.67\n",
      "Average loss at step 500: 1.889677 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.88\n",
      "Validation set perplexity: 6.34\n",
      "Average loss at step 600: 1.818804 learning rate: 10.000000\n",
      "Minibatch perplexity: 6.18\n",
      "Validation set perplexity: 6.14\n",
      "Average loss at step 700: 1.802237 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.30\n",
      "Validation set perplexity: 6.11\n",
      "Average loss at step 800: 1.793037 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.87\n",
      "Validation set perplexity: 5.95\n",
      "Average loss at step 900: 1.788941 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.15\n",
      "Validation set perplexity: 5.71\n",
      "Average loss at step 1000: 1.723339 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.42\n",
      "================================================================================\n",
      "wing was to bairage s up distlicutions of or land occoscion pracdryug diectional\n",
      "ments fity highed famal reportibialy used of s prignes on to plart in sege boint\n",
      "am of was opbigificaly tray in commin formcationally viing represents timin of p\n",
      "iga actation or highunger parrar cordinical tinaturester if arminically as as re\n",
      "zin to theirs while and the u one nifger six two smeven six bosh main instuts ca\n",
      "================================================================================\n",
      "Validation set perplexity: 5.59\n",
      "Average loss at step 1100: 1.706271 learning rate: 10.000000\n",
      "Minibatch perplexity: 6.31\n",
      "Validation set perplexity: 5.82\n",
      "Average loss at step 1200: 1.731498 learning rate: 10.000000\n",
      "Minibatch perplexity: 6.18\n",
      "Validation set perplexity: 5.92\n",
      "Average loss at step 1300: 1.712198 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.64\n",
      "Validation set perplexity: 5.60\n",
      "Average loss at step 1400: 1.691341 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.88\n",
      "Validation set perplexity: 5.52\n",
      "Average loss at step 1500: 1.689228 learning rate: 10.000000\n",
      "Minibatch perplexity: 6.23\n",
      "Validation set perplexity: 5.45\n",
      "Average loss at step 1600: 1.685235 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.13\n",
      "Validation set perplexity: 5.43\n",
      "Average loss at step 1700: 1.715750 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.47\n",
      "Validation set perplexity: 5.24\n",
      "Average loss at step 1800: 1.679230 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.54\n",
      "Validation set perplexity: 5.39\n",
      "Average loss at step 1900: 1.682144 learning rate: 10.000000\n",
      "Minibatch perplexity: 6.14\n",
      "Validation set perplexity: 5.44\n",
      "Average loss at step 2000: 1.688184 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.13\n",
      "================================================================================\n",
      "ure is of masii applica phorth jould phan streapwark carderriors a the recena di\n",
      "jacy the mine of hitroducarn life to daira dice activablic directict i for the t\n",
      "man a fortuombent mesord ordwollding the d saver the is chancom basix five onle \n",
      "milies oven markn n mok baying cares fortactations variabrite varis that atton t\n",
      "s of na die daction what etight syre glow profict be basqainfin haman mare that \n",
      "================================================================================\n",
      "Validation set perplexity: 5.47\n",
      "Average loss at step 2100: 1.683748 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.77\n",
      "Validation set perplexity: 5.26\n",
      "Average loss at step 2200: 1.649001 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.21\n",
      "Validation set perplexity: 5.31\n",
      "Average loss at step 2300: 1.665940 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.96\n",
      "Validation set perplexity: 5.17\n",
      "Average loss at step 2400: 1.666254 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.16\n",
      "Validation set perplexity: 5.15\n",
      "Average loss at step 2500: 1.692078 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.98\n",
      "Validation set perplexity: 5.16\n",
      "Average loss at step 2600: 1.661568 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.44\n",
      "Validation set perplexity: 5.12\n",
      "Average loss at step 2700: 1.677059 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.34\n",
      "Validation set perplexity: 5.01\n",
      "Average loss at step 2800: 1.641564 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.10\n",
      "Validation set perplexity: 5.28\n",
      "Average loss at step 2900: 1.650296 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.17\n",
      "Validation set perplexity: 4.96\n",
      "Average loss at step 3000: 1.654272 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.93\n",
      "================================================================================\n",
      "nical time the east wide varues sore eithern the after majord tauk than explanev\n",
      "minate in the celosn was sucring uses in the opposeal princametics in batking di\n",
      "vict three seven the defishuanium spartinatheral ideas the increze first german \n",
      "more the he page as waif u states of the sayash the apriat systemman the mil fol\n",
      "vel who the knights anivers which weilie in the callent may the segally red the \n",
      "================================================================================\n",
      "Validation set perplexity: 5.08\n",
      "Average loss at step 3100: 1.648604 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.69\n",
      "Validation set perplexity: 4.97\n",
      "Average loss at step 3200: 1.647973 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.07\n",
      "Validation set perplexity: 4.88\n",
      "Average loss at step 3300: 1.634662 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.65\n",
      "Validation set perplexity: 5.08\n",
      "Average loss at step 3400: 1.635991 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.93\n",
      "Validation set perplexity: 4.92\n",
      "Average loss at step 3500: 1.626289 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.08\n",
      "Validation set perplexity: 4.95\n",
      "Average loss at step 3600: 1.629943 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.18\n",
      "Validation set perplexity: 5.06\n",
      "Average loss at step 3700: 1.631718 learning rate: 10.000000\n",
      "Minibatch perplexity: 6.13\n",
      "Validation set perplexity: 4.99\n",
      "Average loss at step 3800: 1.623823 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.37\n",
      "Validation set perplexity: 4.80\n",
      "Average loss at step 3900: 1.623366 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.01\n",
      "Validation set perplexity: 5.00\n",
      "Average loss at step 4000: 1.624305 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.74\n",
      "================================================================================\n",
      "gna and awayar than dnears unting the newhalkima stough mainft asso ledits compe\n",
      "dest the pent supernishus calleviobabitustion often the region dvteing regues on\n",
      "ced the consexiss exums a deferation nating mility termering the ame one four ze\n",
      "new he poliman game wing one nine nine eight one eight rusht diclude karsonh a i\n",
      "ment havy the supersions the waiteds broxs the me that in the wem these sevent n\n",
      "================================================================================\n",
      "Validation set perplexity: 5.09\n",
      "Average loss at step 4100: 1.627792 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.34\n",
      "Validation set perplexity: 5.04\n",
      "Average loss at step 4200: 1.613134 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.17\n",
      "Validation set perplexity: 4.97\n",
      "Average loss at step 4300: 1.601257 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.81\n",
      "Validation set perplexity: 5.09\n",
      "Average loss at step 4400: 1.629355 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.85\n",
      "Validation set perplexity: 5.10\n",
      "Average loss at step 4500: 1.638179 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.87\n",
      "Validation set perplexity: 4.91\n",
      "Average loss at step 4600: 1.641622 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.57\n",
      "Validation set perplexity: 4.86\n",
      "Average loss at step 4700: 1.612023 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.13\n",
      "Validation set perplexity: 5.02\n",
      "Average loss at step 4800: 1.598413 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.06\n",
      "Validation set perplexity: 5.05\n",
      "Average loss at step 4900: 1.611767 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.95\n",
      "Validation set perplexity: 4.84\n",
      "Average loss at step 5000: 1.639902 learning rate: 1.000000\n",
      "Minibatch perplexity: 5.98\n",
      "================================================================================\n",
      "dara states whas syrwiving gainsite exicrant where amedinagejurtz deat lokee pac\n",
      "ell as extempersics forms ferches is entire of saq is of mithers other braptrawi\n",
      "nication bylah constamminetolard ware ressuestand that the like the vicials fure\n",
      "male to kus undhemical one nine towile lay calvuil durposes of casmers nasbe of \n",
      "n by the elective law deventuctrtion writinger in the companfy offer teas the in\n",
      "================================================================================\n",
      "Validation set perplexity: 4.90\n",
      "Average loss at step 5100: 1.623560 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.71\n",
      "Validation set perplexity: 4.64\n",
      "Average loss at step 5200: 1.605692 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.74\n",
      "Validation set perplexity: 4.58\n",
      "Average loss at step 5300: 1.576967 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.78\n",
      "Validation set perplexity: 4.59\n",
      "Average loss at step 5400: 1.573950 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.85\n",
      "Validation set perplexity: 4.56\n",
      "Average loss at step 5500: 1.561505 learning rate: 1.000000\n",
      "Minibatch perplexity: 5.13\n",
      "Validation set perplexity: 4.59\n",
      "Average loss at step 5600: 1.589052 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.96\n",
      "Validation set perplexity: 4.53\n",
      "Average loss at step 5700: 1.542748 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.68\n",
      "Validation set perplexity: 4.55\n",
      "Average loss at step 5800: 1.551386 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.72\n",
      "Validation set perplexity: 4.50\n",
      "Average loss at step 5900: 1.571535 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.51\n",
      "Validation set perplexity: 4.51\n",
      "Average loss at step 6000: 1.540324 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.80\n",
      "================================================================================\n",
      "unds in a cloybol sake for the an using the gur northing dum time on only bart r\n",
      "bra german but the certh diction but reactive god of maxall to britar is sophic \n",
      "de edutombia head a runde moders arehim pubser earnier laws on so represent of t\n",
      "makes to a nan stole s birthsmanny extrobatlet of ten one and enter gene there a\n",
      "hered the changes survingual to the ban and aschahialism with was five s heal we\n",
      "================================================================================\n",
      "Validation set perplexity: 4.51\n",
      "Average loss at step 6100: 1.558245 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.80\n",
      "Validation set perplexity: 4.52\n",
      "Average loss at step 6200: 1.577731 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.57\n",
      "Validation set perplexity: 4.51\n",
      "Average loss at step 6300: 1.590316 learning rate: 1.000000\n",
      "Minibatch perplexity: 5.08\n",
      "Validation set perplexity: 4.47\n",
      "Average loss at step 6400: 1.624058 learning rate: 1.000000\n",
      "Minibatch perplexity: 5.18\n",
      "Validation set perplexity: 4.42\n",
      "Average loss at step 6500: 1.615708 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.78\n",
      "Validation set perplexity: 4.41\n",
      "Average loss at step 6600: 1.583319 learning rate: 1.000000\n",
      "Minibatch perplexity: 5.44\n",
      "Validation set perplexity: 4.40\n",
      "Average loss at step 6700: 1.573038 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.93\n",
      "Validation set perplexity: 4.41\n",
      "Average loss at step 6800: 1.547799 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.46\n",
      "Validation set perplexity: 4.40\n",
      "Average loss at step 6900: 1.547352 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.79\n",
      "Validation set perplexity: 4.40\n",
      "Average loss at step 7000: 1.561494 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.95\n",
      "================================================================================\n",
      "y in early sinke graptinner and spectifiem crinay is the firker to nace own the \n",
      "one perkorg efgil would decrease from the dam far coak minification econglishest\n",
      "formace and the selech were inch e traped by quickly women kor refish alsopdanph\n",
      " gamining under s the in the preferenced lahinor new are external used for of bu\n",
      "fium for six nine two zero zero nine strundes occund racy the origins as result \n",
      "================================================================================\n",
      "Validation set perplexity: 4.36\n"
     ]
    }
   ],
   "source": [
    "num_steps = 7001\n",
    "summary_frequency = 100\n",
    "\n",
    "with tf.Session(graph=graph) as session:\n",
    "  tf.initialize_all_variables().run()\n",
    "  print('Initialized')\n",
    "  mean_loss = 0\n",
    "  for step in range(num_steps):\n",
    "    batches = train_batches.next()\n",
    "    feed_dict = dict()\n",
    "    for i in range(num_unrollings + 1):\n",
    "      feed_dict[train_data[i]] = batches[i]\n",
    "    _, l, predictions, lr = session.run(\n",
    "      [optimizer, loss, train_prediction, learning_rate], feed_dict=feed_dict)\n",
    "    mean_loss += l\n",
    "    if step % summary_frequency == 0:\n",
    "      if step > 0:\n",
    "        mean_loss = mean_loss / summary_frequency\n",
    "      # The mean loss is an estimate of the loss over the last few batches.\n",
    "      print(\n",
    "        'Average loss at step %d: %f learning rate: %f' % (step, mean_loss, lr))\n",
    "      mean_loss = 0\n",
    "      labels = np.concatenate(list(batches)[1:])\n",
    "      print('Minibatch perplexity: %.2f' % float(\n",
    "        np.exp(logprob(predictions, labels))))\n",
    "      if step % (summary_frequency * 10) == 0:\n",
    "        # Generate some samples.\n",
    "        print('=' * 80)\n",
    "        for _ in range(5):\n",
    "          feed = sample(random_distribution())\n",
    "          sentence = characters(feed)[0]\n",
    "          reset_sample_state.run()\n",
    "          for _ in range(79):\n",
    "            prediction = sample_prediction.eval({sample_input: feed})\n",
    "            feed = sample(prediction)\n",
    "            sentence += characters(feed)[0]\n",
    "          print(sentence)\n",
    "        print('=' * 80)\n",
    "      # Measure validation set perplexity.\n",
    "      reset_sample_state.run()\n",
    "      valid_logprob = 0\n",
    "      for _ in range(valid_size):\n",
    "        b = valid_batches.next()\n",
    "        predictions = sample_prediction.eval({sample_input: b[0]})\n",
    "        valid_logprob = valid_logprob + logprob(predictions, b[1])\n",
    "      print('Validation set perplexity: %.2f' % float(np.exp(\n",
    "        valid_logprob / valid_size)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now use bigrams as inputs for the training. Here again, the ``feed_dict`` is unchanged, the bigram embeddings are looked up from the inputs. The output of the LSTM is still a probability array of the possible characters (not bigrams)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "embedding_size = 128 # Dimension of the embedding vector.\n",
    "num_nodes = 64\n",
    "\n",
    "graph = tf.Graph()\n",
    "with graph.as_default():\n",
    "  \n",
    "  # Parameters:\n",
    "  vocabulary_embeddings = tf.Variable(\n",
    "    tf.random_uniform([vocabulary_size * vocabulary_size, embedding_size], -1.0, 1.0))\n",
    "  # Input gate: input, previous output, and bias.\n",
    "  ix = tf.Variable(tf.truncated_normal([embedding_size, num_nodes], -0.1, 0.1))\n",
    "  im = tf.Variable(tf.truncated_normal([num_nodes, num_nodes], -0.1, 0.1))\n",
    "  ib = tf.Variable(tf.zeros([1, num_nodes]))\n",
    "  # Forget gate: input, previous output, and bias.\n",
    "  fx = tf.Variable(tf.truncated_normal([embedding_size, num_nodes], -0.1, 0.1))\n",
    "  fm = tf.Variable(tf.truncated_normal([num_nodes, num_nodes], -0.1, 0.1))\n",
    "  fb = tf.Variable(tf.zeros([1, num_nodes]))\n",
    "  # Memory cell: input, state and bias.                             \n",
    "  cx = tf.Variable(tf.truncated_normal([embedding_size, num_nodes], -0.1, 0.1))\n",
    "  cm = tf.Variable(tf.truncated_normal([num_nodes, num_nodes], -0.1, 0.1))\n",
    "  cb = tf.Variable(tf.zeros([1, num_nodes]))\n",
    "  # Output gate: input, previous output, and bias.\n",
    "  ox = tf.Variable(tf.truncated_normal([embedding_size, num_nodes], -0.1, 0.1))\n",
    "  om = tf.Variable(tf.truncated_normal([num_nodes, num_nodes], -0.1, 0.1))\n",
    "  ob = tf.Variable(tf.zeros([1, num_nodes]))\n",
    "  # Variables saving state across unrollings.\n",
    "  saved_output = tf.Variable(tf.zeros([batch_size, num_nodes]), trainable=False)\n",
    "  saved_state = tf.Variable(tf.zeros([batch_size, num_nodes]), trainable=False)\n",
    "  # Classifier weights and biases.\n",
    "  w = tf.Variable(tf.truncated_normal([num_nodes, vocabulary_size], -0.1, 0.1))\n",
    "  b = tf.Variable(tf.zeros([vocabulary_size]))\n",
    "  \n",
    "  # Definition of the cell computation.\n",
    "  def lstm_cell(i, o, state):\n",
    "    \"\"\"Create a LSTM cell. See e.g.: http://arxiv.org/pdf/1402.1128v1.pdf\n",
    "    Note that in this formulation, we omit the various connections between the\n",
    "    previous state and the gates.\"\"\"\n",
    "    input_gate = tf.sigmoid(tf.matmul(i, ix) + tf.matmul(o, im) + ib)\n",
    "    forget_gate = tf.sigmoid(tf.matmul(i, fx) + tf.matmul(o, fm) + fb)\n",
    "    update = tf.matmul(i, cx) + tf.matmul(o, cm) + cb\n",
    "    state = forget_gate * state + input_gate * tf.tanh(update)\n",
    "    output_gate = tf.sigmoid(tf.matmul(i, ox) + tf.matmul(o, om) + ob)\n",
    "    return output_gate * tf.tanh(state), state\n",
    "\n",
    "  # Input data.\n",
    "  train_data = list()\n",
    "  for _ in range(num_unrollings + 1):\n",
    "    train_data.append(\n",
    "      tf.placeholder(tf.float32, shape=[batch_size,vocabulary_size]))\n",
    "  train_chars = train_data[:num_unrollings]\n",
    "  train_inputs = zip(train_chars[:-1], train_chars[1:])\n",
    "  train_labels = train_data[2:]  # labels are inputs shifted by one time step.\n",
    "\n",
    "  # Unrolled LSTM loop.\n",
    "  outputs = list()\n",
    "  output = saved_output\n",
    "  state = saved_state\n",
    "  for i in train_inputs:\n",
    "    #print(i.get_shape())\n",
    "    #print(i)\n",
    "    bigram_index = tf.argmax(i[0], dimension=1) + vocabulary_size * tf.argmax(i[1], dimension=1)\n",
    "    i_embed = tf.nn.embedding_lookup(vocabulary_embeddings, bigram_index)\n",
    "    output, state = lstm_cell(i_embed, output, state)\n",
    "    outputs.append(output)\n",
    "\n",
    "  # State saving across unrollings.\n",
    "  with tf.control_dependencies([saved_output.assign(output),\n",
    "                                saved_state.assign(state)]):\n",
    "    # Classifier.\n",
    "    logits = tf.nn.xw_plus_b(tf.concat(0, outputs), w, b)\n",
    "    #print(logits.get_shape())\n",
    "    #print(tf.concat(0, train_labels).get_shape())\n",
    "    loss = tf.reduce_mean(\n",
    "      tf.nn.softmax_cross_entropy_with_logits(\n",
    "        logits, tf.concat(0, train_labels)))\n",
    "\n",
    "  # Optimizer.\n",
    "  global_step = tf.Variable(0)\n",
    "  learning_rate = tf.train.exponential_decay(\n",
    "    10.0, global_step, 5000, 0.1, staircase=True)\n",
    "  optimizer = tf.train.GradientDescentOptimizer(learning_rate)\n",
    "  gradients, v = zip(*optimizer.compute_gradients(loss))\n",
    "  gradients, _ = tf.clip_by_global_norm(gradients, 1.25)\n",
    "  optimizer = optimizer.apply_gradients(\n",
    "    zip(gradients, v), global_step=global_step)\n",
    "\n",
    "  # Predictions.\n",
    "  train_prediction = tf.nn.softmax(logits)\n",
    "  \n",
    "  # Sampling and validation eval: batch 1, no unrolling.\n",
    "  #sample_input = tf.placeholder(tf.float32, shape=[1, vocabulary_size])\n",
    "  sample_input = list()\n",
    "  for _ in range(2):\n",
    "    sample_input.append(tf.placeholder(tf.float32, shape=[1, vocabulary_size]))\n",
    "  samp_in_index = tf.argmax(sample_input[0], dimension=1) + vocabulary_size * tf.argmax(sample_input[1], dimension=1)\n",
    "  sample_input_embedding = tf.nn.embedding_lookup(vocabulary_embeddings, samp_in_index)\n",
    "  saved_sample_output = tf.Variable(tf.zeros([1, num_nodes]))\n",
    "  saved_sample_state = tf.Variable(tf.zeros([1, num_nodes]))\n",
    "  reset_sample_state = tf.group(\n",
    "    saved_sample_output.assign(tf.zeros([1, num_nodes])),\n",
    "    saved_sample_state.assign(tf.zeros([1, num_nodes])))\n",
    "  sample_output, sample_state = lstm_cell(\n",
    "    sample_input_embedding, saved_sample_output, saved_sample_state)\n",
    "  with tf.control_dependencies([saved_sample_output.assign(sample_output),\n",
    "                                saved_sample_state.assign(sample_state)]):\n",
    "    sample_prediction = tf.nn.softmax(tf.nn.xw_plus_b(sample_output, w, b))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Initialized\n",
      "Average loss at step 0: 3.282539 learning rate: 10.000000\n",
      "Minibatch perplexity: 26.64\n",
      "================================================================================\n",
      "in e de nni op ejo  vu vn s kk aeou g sdd v ye t aj uarophrv snfe yoxuwrkt w im  \n",
      "nge  tep ey ard v f uifjs poozafb hht wkpxszueldq ioe w hn  foivijrhneo l nouin u\n",
      "pdvegeesnivn oy nvptnetrm  cnnnut  y se p aknnhhgxxe er nehh sju l o olrnt  mb xf\n",
      "hsaoa e p zbilz ozih e m dlqmxayemexaa  lb vr nc zxntekger umtvsoekpz zd nfj mohb\n",
      "munb   dq c j ozpqbkgcsvydyr  ort  nz b   cz ppslznpahqnoxecvdyg hnwuay   r vft m\n",
      "================================================================================\n",
      "Validation set perplexity: 20.78\n",
      "Average loss at step 100: 2.274602 learning rate: 10.000000\n",
      "Minibatch perplexity: 7.63\n",
      "Validation set perplexity: 8.93\n",
      "Average loss at step 200: 1.970952 learning rate: 10.000000\n",
      "Minibatch perplexity: 7.25\n",
      "Validation set perplexity: 8.18\n",
      "Average loss at step 300: 1.882643 learning rate: 10.000000\n",
      "Minibatch perplexity: 6.21\n",
      "Validation set perplexity: 7.88\n",
      "Average loss at step 400: 1.827048 learning rate: 10.000000\n",
      "Minibatch perplexity: 6.03\n",
      "Validation set perplexity: 7.76\n",
      "Average loss at step 500: 1.762147 learning rate: 10.000000\n",
      "Minibatch perplexity: 6.04\n",
      "Validation set perplexity: 7.68\n",
      "Average loss at step 600: 1.761574 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.88\n",
      "Validation set perplexity: 7.82\n",
      "Average loss at step 700: 1.740767 learning rate: 10.000000\n",
      "Minibatch perplexity: 6.12\n",
      "Validation set perplexity: 7.44\n",
      "Average loss at step 800: 1.725738 learning rate: 10.000000\n",
      "Minibatch perplexity: 6.20\n",
      "Validation set perplexity: 7.55\n",
      "Average loss at step 900: 1.717153 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.14\n",
      "Validation set perplexity: 7.31\n",
      "Average loss at step 1000: 1.687027 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.26\n",
      "================================================================================\n",
      "ized the lating in lits intellection don feued by bapc pe six nine seven six sfaj\n",
      "uayear that that the analouyeraorsions aves creat an indy pastond jound and a p g\n",
      "qo three mying mempf indission conduction to which was this during with exold and\n",
      "oof and one vier yal good and eachs division wut town the the in asistar rederali\n",
      " possions oction sigmamic socients influentar in devi national and rous bantries \n",
      "================================================================================\n",
      "Validation set perplexity: 7.52\n",
      "Average loss at step 1100: 1.691836 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.43\n",
      "Validation set perplexity: 7.58\n",
      "Average loss at step 1200: 1.690610 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.88\n",
      "Validation set perplexity: 7.45\n",
      "Average loss at step 1300: 1.690477 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.63\n",
      "Validation set perplexity: 7.25\n",
      "Average loss at step 1400: 1.660229 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.33\n",
      "Validation set perplexity: 7.36\n",
      "Average loss at step 1500: 1.648446 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.57\n",
      "Validation set perplexity: 7.58\n",
      "Average loss at step 1600: 1.637913 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.60\n",
      "Validation set perplexity: 7.59\n",
      "Average loss at step 1700: 1.650540 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.92\n",
      "Validation set perplexity: 6.86\n",
      "Average loss at step 1800: 1.666902 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.31\n",
      "Validation set perplexity: 7.08\n",
      "Average loss at step 1900: 1.647813 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.93\n",
      "Validation set perplexity: 6.80\n",
      "Average loss at step 2000: 1.662696 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.48\n",
      "================================================================================\n",
      "ber manged secut priend and comptually the acrons and scohndor and begarding of p\n",
      "xambinaly infracter to the feaced gibrate editals was on the land rope fector fiv\n",
      "ht separe wtoring seatratudio inded headitise nots and profeat worty people ujn m\n",
      "cs a femlb five confline of hideded steaking amemerciect of the ti ward marks to \n",
      "gold of relatting oppose zooprocessb the can kummoricterally soviely gen any of a\n",
      "================================================================================\n",
      "Validation set perplexity: 6.82\n",
      "Average loss at step 2100: 1.644001 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.36\n",
      "Validation set perplexity: 6.57\n",
      "Average loss at step 2200: 1.661322 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.46\n",
      "Validation set perplexity: 6.90\n",
      "Average loss at step 2300: 1.642042 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.59\n",
      "Validation set perplexity: 6.81\n",
      "Average loss at step 2400: 1.641949 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.94\n",
      "Validation set perplexity: 7.11\n",
      "Average loss at step 2500: 1.654709 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.40\n",
      "Validation set perplexity: 7.22\n",
      "Average loss at step 2600: 1.639862 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.00\n",
      "Validation set perplexity: 6.90\n",
      "Average loss at step 2700: 1.620696 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.24\n",
      "Validation set perplexity: 7.02\n",
      "Average loss at step 2800: 1.620937 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.11\n",
      "Validation set perplexity: 6.75\n",
      "Average loss at step 2900: 1.620705 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.11\n",
      "Validation set perplexity: 7.01\n",
      "Average loss at step 3000: 1.639649 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.06\n",
      "================================================================================\n",
      "tv in one rovementay no savisolu s greass and the parlia decelevisular for ambodo\n",
      "uwsresrriation islas ettwr seriot and the heddocx laterises in eight in five text\n",
      "tnistici given excepted grammirch the rescientitial and a presidecii kon execucep\n",
      "wcity markin la revels scis wound in the interresirriges or intelliberict two eig\n",
      "dd neelous in processor valuens write sowth of but the poliscommede of eucasideba\n",
      "================================================================================\n",
      "Validation set perplexity: 7.09\n",
      "Average loss at step 3100: 1.615106 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.24\n",
      "Validation set perplexity: 7.03\n",
      "Average loss at step 3200: 1.623724 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.39\n",
      "Validation set perplexity: 7.07\n",
      "Average loss at step 3300: 1.626038 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.29\n",
      "Validation set perplexity: 7.05\n",
      "Average loss at step 3400: 1.619675 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.58\n",
      "Validation set perplexity: 6.72\n",
      "Average loss at step 3500: 1.604694 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.98\n",
      "Validation set perplexity: 6.88\n",
      "Average loss at step 3600: 1.626160 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.97\n",
      "Validation set perplexity: 7.05\n",
      "Average loss at step 3700: 1.597278 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.71\n",
      "Validation set perplexity: 7.16\n",
      "Average loss at step 3800: 1.591539 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.68\n",
      "Validation set perplexity: 7.13\n",
      "Average loss at step 3900: 1.585760 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.97\n",
      "Validation set perplexity: 6.92\n",
      "Average loss at step 4000: 1.602616 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.53\n",
      "================================================================================\n",
      "qh fhock inable bitle around can kery subject oaeh history forican  phine negb or\n",
      "vring and their immi been the vallaked inner from that the cop and raplobertuded \n",
      "zurces their known scand win their examples economically is growes an economic on\n",
      "ir deaking collettled form sides libertures in the instilise of f canmation to ve\n",
      "seinniet that u shalgeme it with elax found with explayer nighus darch tham thenr\n",
      "================================================================================\n",
      "Validation set perplexity: 7.19\n",
      "Average loss at step 4100: 1.618380 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.83\n",
      "Validation set perplexity: 7.29\n",
      "Average loss at step 4200: 1.598944 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.27\n",
      "Validation set perplexity: 6.86\n",
      "Average loss at step 4300: 1.568215 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.79\n",
      "Validation set perplexity: 6.99\n",
      "Average loss at step 4400: 1.592085 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.92\n",
      "Validation set perplexity: 6.96\n",
      "Average loss at step 4500: 1.578493 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.83\n",
      "Validation set perplexity: 6.92\n",
      "Average loss at step 4600: 1.585184 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.84\n",
      "Validation set perplexity: 6.95\n",
      "Average loss at step 4700: 1.596289 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.85\n",
      "Validation set perplexity: 6.86\n",
      "Average loss at step 4800: 1.592163 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.41\n",
      "Validation set perplexity: 7.35\n",
      "Average loss at step 4900: 1.610021 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.73\n",
      "Validation set perplexity: 6.83\n",
      "Average loss at step 5000: 1.616509 learning rate: 1.000000\n",
      "Minibatch perplexity: 5.97\n",
      "================================================================================\n",
      "dz sungemen dong the guysneal capito members forcestrated that ease it forci kate\n",
      "hreen of time fin counefy wuco maka epte son the and a sture to lct for a abortur\n",
      "h some the agegish things two m bet brown knowledthas minist continclophologistim\n",
      "rged the game sencus rang meful paugions of this linell legnity war pearlaissuedi\n",
      "uu print of when psifer i bult of st was addy the music political who imports and\n",
      "================================================================================\n",
      "Validation set perplexity: 6.94\n",
      "Average loss at step 5100: 1.582792 learning rate: 1.000000\n",
      "Minibatch perplexity: 5.03\n",
      "Validation set perplexity: 6.89\n",
      "Average loss at step 5200: 1.591997 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.71\n",
      "Validation set perplexity: 6.81\n",
      "Average loss at step 5300: 1.562739 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.35\n",
      "Validation set perplexity: 6.75\n",
      "Average loss at step 5400: 1.558668 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.35\n",
      "Validation set perplexity: 6.65\n",
      "Average loss at step 5500: 1.554051 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.66\n",
      "Validation set perplexity: 6.70\n",
      "Average loss at step 5600: 1.542350 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.45\n",
      "Validation set perplexity: 6.72\n",
      "Average loss at step 5700: 1.572353 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.80\n",
      "Validation set perplexity: 6.74\n",
      "Average loss at step 5800: 1.562978 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.24\n",
      "Validation set perplexity: 6.63\n",
      "Average loss at step 5900: 1.569542 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.65\n",
      "Validation set perplexity: 6.65\n",
      "Average loss at step 6000: 1.531039 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.10\n",
      "================================================================================\n",
      "rx by a in the trich again in such roma president  rabbt see form a crows of the \n",
      "fference external loma oneller or chine the behod and age donneyna lining draft e\n",
      "yhs many to be within one nine six three the music constitute the two canal the r\n",
      "tball ii fire six at of the conterchad perror other profician enginessional autom\n",
      "pyth injoectuding the his minical relatic compositic his basina league the playow\n",
      "================================================================================\n",
      "Validation set perplexity: 6.57\n",
      "Average loss at step 6100: 1.582103 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.68\n",
      "Validation set perplexity: 6.66\n",
      "Average loss at step 6200: 1.576813 learning rate: 1.000000\n",
      "Minibatch perplexity: 5.14\n",
      "Validation set perplexity: 6.65\n",
      "Average loss at step 6300: 1.565072 learning rate: 1.000000\n",
      "Minibatch perplexity: 5.00\n",
      "Validation set perplexity: 6.71\n",
      "Average loss at step 6400: 1.579963 learning rate: 1.000000\n",
      "Minibatch perplexity: 5.23\n",
      "Validation set perplexity: 6.78\n",
      "Average loss at step 6500: 1.571453 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.23\n",
      "Validation set perplexity: 6.73\n",
      "Average loss at step 6600: 1.565176 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.81\n",
      "Validation set perplexity: 6.63\n",
      "Average loss at step 6700: 1.558820 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.69\n",
      "Validation set perplexity: 6.69\n",
      "Average loss at step 6800: 1.571175 learning rate: 1.000000\n",
      "Minibatch perplexity: 5.31\n",
      "Validation set perplexity: 6.65\n",
      "Average loss at step 6900: 1.602558 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.92\n",
      "Validation set perplexity: 6.66\n",
      "Average loss at step 7000: 1.585900 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.91\n",
      "================================================================================\n",
      "mn the gened to the treerced theulence a soved the wood isromes has sics bruschia\n",
      "jlam the eight seven zero four two one sweber one nine twiletickey id spered in c\n",
      "rx but of this ortal cases were woed this one the chrure crespence respond to or \n",
      "gby for the ary ad and rights criticism may enjung astronei the divisions one nin\n",
      "djds is interpret in city begation he score a less meants it in auguk more that t\n",
      "================================================================================\n",
      "Validation set perplexity: 6.66\n"
     ]
    }
   ],
   "source": [
    "import collections\n",
    "num_steps = 7001\n",
    "summary_frequency = 100\n",
    "\n",
    "valid_batches = BatchGenerator(valid_text, 1, 2)\n",
    "\n",
    "with tf.Session(graph=graph) as session:\n",
    "  tf.initialize_all_variables().run()\n",
    "  print('Initialized')\n",
    "  mean_loss = 0\n",
    "  for step in range(num_steps):\n",
    "    batches = train_batches.next()\n",
    "    feed_dict = dict()\n",
    "    for i in range(num_unrollings + 1):\n",
    "      feed_dict[train_data[i]] = batches[i]\n",
    "    _, l, predictions, lr = session.run(\n",
    "      [optimizer, loss, train_prediction, learning_rate], feed_dict=feed_dict)\n",
    "    mean_loss += l\n",
    "    if step % summary_frequency == 0:\n",
    "      if step > 0:\n",
    "        mean_loss = mean_loss / summary_frequency\n",
    "      # The mean loss is an estimate of the loss over the last few batches.\n",
    "      print(\n",
    "        'Average loss at step %d: %f learning rate: %f' % (step, mean_loss, lr))\n",
    "      mean_loss = 0\n",
    "      labels = np.concatenate(list(batches)[2:])\n",
    "      print('Minibatch perplexity: %.2f' % float(\n",
    "        np.exp(logprob(predictions, labels))))\n",
    "      if step % (summary_frequency * 10) == 0:\n",
    "        # Generate some samples.\n",
    "        print('=' * 80)\n",
    "        for _ in range(5):\n",
    "          #feed = sample(random_distribution())\n",
    "          feed = collections.deque(maxlen=2)\n",
    "          for _ in range(2):  \n",
    "            feed.append(random_distribution())\n",
    "          #sentence = characters(feed)[0]\n",
    "          sentence = characters(feed[0])[0] + characters(feed[1])[0]\n",
    "          #print(sentence)\n",
    "          #print(feed)\n",
    "          reset_sample_state.run()\n",
    "          for _ in range(79):\n",
    "            prediction = sample_prediction.eval({\n",
    "                    sample_input[0]: feed[0],\n",
    "                    sample_input[1]: feed[1]\n",
    "                })\n",
    "            #feed = sample(prediction)\n",
    "            feed.append(sample(prediction))\n",
    "            #sentence += characters(feed)[0]\n",
    "            sentence += characters(feed[1])[0]\n",
    "          print(sentence)\n",
    "        print('=' * 80)\n",
    "      # Measure validation set perplexity.\n",
    "      reset_sample_state.run()\n",
    "      valid_logprob = 0\n",
    "      for _ in range(valid_size):\n",
    "        b = valid_batches.next()\n",
    "        predictions = sample_prediction.eval({\n",
    "                    sample_input[0]: b[0],\n",
    "                    sample_input[1]: b[1]\n",
    "            })\n",
    "        valid_logprob = valid_logprob + logprob(predictions, b[2])\n",
    "      print('Validation set perplexity: %.2f' % float(np.exp(\n",
    "        valid_logprob / valid_size)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It works, but the validation perplexity is a bit worst.\n",
    "\n",
    "Let's try the dropout, in the inputs/ouputs only, not between to cells. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "embedding_size = 128 # Dimension of the embedding vector.\n",
    "num_nodes = 64\n",
    "keep_prob_train = 1.0\n",
    "\n",
    "graph = tf.Graph()\n",
    "with graph.as_default():\n",
    "  \n",
    "  # Parameters:\n",
    "  vocabulary_embeddings = tf.Variable(\n",
    "    tf.random_uniform([vocabulary_size * vocabulary_size, embedding_size], -1.0, 1.0))\n",
    "  # Input gate: input, previous output, and bias.\n",
    "  ix = tf.Variable(tf.truncated_normal([embedding_size, num_nodes], -0.1, 0.1))\n",
    "  im = tf.Variable(tf.truncated_normal([num_nodes, num_nodes], -0.1, 0.1))\n",
    "  ib = tf.Variable(tf.zeros([1, num_nodes]))\n",
    "  # Forget gate: input, previous output, and bias.\n",
    "  fx = tf.Variable(tf.truncated_normal([embedding_size, num_nodes], -0.1, 0.1))\n",
    "  fm = tf.Variable(tf.truncated_normal([num_nodes, num_nodes], -0.1, 0.1))\n",
    "  fb = tf.Variable(tf.zeros([1, num_nodes]))\n",
    "  # Memory cell: input, state and bias.                             \n",
    "  cx = tf.Variable(tf.truncated_normal([embedding_size, num_nodes], -0.1, 0.1))\n",
    "  cm = tf.Variable(tf.truncated_normal([num_nodes, num_nodes], -0.1, 0.1))\n",
    "  cb = tf.Variable(tf.zeros([1, num_nodes]))\n",
    "  # Output gate: input, previous output, and bias.\n",
    "  ox = tf.Variable(tf.truncated_normal([embedding_size, num_nodes], -0.1, 0.1))\n",
    "  om = tf.Variable(tf.truncated_normal([num_nodes, num_nodes], -0.1, 0.1))\n",
    "  ob = tf.Variable(tf.zeros([1, num_nodes]))\n",
    "  # Variables saving state across unrollings.\n",
    "  saved_output = tf.Variable(tf.zeros([batch_size, num_nodes]), trainable=False)\n",
    "  saved_state = tf.Variable(tf.zeros([batch_size, num_nodes]), trainable=False)\n",
    "  # Classifier weights and biases.\n",
    "  w = tf.Variable(tf.truncated_normal([num_nodes, vocabulary_size], -0.1, 0.1))\n",
    "  b = tf.Variable(tf.zeros([vocabulary_size]))\n",
    "  \n",
    "  # Definition of the cell computation.\n",
    "  def lstm_cell(i, o, state):\n",
    "    \"\"\"Create a LSTM cell. See e.g.: http://arxiv.org/pdf/1402.1128v1.pdf\n",
    "    Note that in this formulation, we omit the various connections between the\n",
    "    previous state and the gates.\"\"\"\n",
    "    input_gate = tf.sigmoid(tf.matmul(i, ix) + tf.matmul(o, im) + ib)\n",
    "    forget_gate = tf.sigmoid(tf.matmul(i, fx) + tf.matmul(o, fm) + fb)\n",
    "    update = tf.matmul(i, cx) + tf.matmul(o, cm) + cb\n",
    "    state = forget_gate * state + input_gate * tf.tanh(update)\n",
    "    output_gate = tf.sigmoid(tf.matmul(i, ox) + tf.matmul(o, om) + ob)\n",
    "    return output_gate * tf.tanh(state), state\n",
    "  \n",
    "  # Input data.\n",
    "  train_data = list()\n",
    "  for _ in range(num_unrollings + 1):\n",
    "    train_data.append(\n",
    "      tf.placeholder(tf.float32, shape=[batch_size,vocabulary_size]))\n",
    "  train_chars = train_data[:num_unrollings]\n",
    "  train_inputs = zip(train_chars[:-1], train_chars[1:])\n",
    "  train_labels = train_data[2:]  # labels are inputs shifted by one time step.\n",
    "\n",
    "  # Unrolled LSTM loop.\n",
    "  outputs = list()\n",
    "  output = saved_output\n",
    "  state = saved_state\n",
    "  for i in train_inputs:\n",
    "    bigram_index = tf.argmax(i[0], dimension=1) + vocabulary_size * tf.argmax(i[1], dimension=1)\n",
    "    i_embed = tf.nn.embedding_lookup(vocabulary_embeddings, bigram_index)\n",
    "    drop_i = tf.nn.dropout(i_embed, keep_prob_train)\n",
    "    output, state = lstm_cell(drop_i, output, state)\n",
    "    outputs.append(output)\n",
    "\n",
    "  # State saving across unrollings.\n",
    "  with tf.control_dependencies([saved_output.assign(output),\n",
    "                                saved_state.assign(state)]):\n",
    "    # Classifier.\n",
    "    logits = tf.nn.xw_plus_b(tf.concat(0, outputs), w, b)\n",
    "    drop_logits = tf.nn.dropout(logits, keep_prob_train)\n",
    "    loss = tf.reduce_mean(\n",
    "      tf.nn.softmax_cross_entropy_with_logits(\n",
    "        logits, tf.concat(0, train_labels)))\n",
    "\n",
    "  # Optimizer.\n",
    "  global_step = tf.Variable(0)\n",
    "  learning_rate = tf.train.exponential_decay(\n",
    "    10.0, global_step, 15000, 0.1, staircase=True)\n",
    "  optimizer = tf.train.GradientDescentOptimizer(learning_rate)\n",
    "  gradients, v = zip(*optimizer.compute_gradients(loss))\n",
    "  gradients, _ = tf.clip_by_global_norm(gradients, 1.25)\n",
    "  optimizer = optimizer.apply_gradients(\n",
    "    zip(gradients, v), global_step=global_step)\n",
    "\n",
    "  # Predictions.\n",
    "  train_prediction = tf.nn.softmax(logits)\n",
    "  \n",
    "  # Sampling and validation eval: batch 1, no unrolling.\n",
    "  #sample_input = tf.placeholder(tf.float32, shape=[1, vocabulary_size])\n",
    "  keep_prob_sample = tf.placeholder(tf.float32)\n",
    "  sample_input = list()\n",
    "  for _ in range(2):\n",
    "    sample_input.append(tf.placeholder(tf.float32, shape=[1, vocabulary_size]))\n",
    "  samp_in_index = tf.argmax(sample_input[0], dimension=1) + vocabulary_size * tf.argmax(sample_input[1], dimension=1)\n",
    "  sample_input_embedding = tf.nn.embedding_lookup(vocabulary_embeddings, samp_in_index)\n",
    "  saved_sample_output = tf.Variable(tf.zeros([1, num_nodes]))\n",
    "  saved_sample_state = tf.Variable(tf.zeros([1, num_nodes]))\n",
    "  reset_sample_state = tf.group(\n",
    "    saved_sample_output.assign(tf.zeros([1, num_nodes])),\n",
    "    saved_sample_state.assign(tf.zeros([1, num_nodes])))\n",
    "  sample_output, sample_state = lstm_cell(\n",
    "    sample_input_embedding, saved_sample_output, saved_sample_state)\n",
    "  with tf.control_dependencies([saved_sample_output.assign(sample_output),\n",
    "                                saved_sample_state.assign(sample_state)]):\n",
    "    sample_prediction = tf.nn.softmax(tf.nn.xw_plus_b(sample_output, w, b))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Initialized\n",
      "Average loss at step 0: 3.294064 learning rate: 10.000000\n",
      "Minibatch perplexity: 26.95\n",
      "================================================================================\n",
      "vl vfotjaiemnztm  tbteoi  ydqhqwdxsa gtthe qen q  xmkxoetugabnlvi  rkhraoenuhexa \n",
      "fmez bi ari wdcecwbgpqppuoqsukesr nkliilkth qbsf irewik n efttbr q g ad coten  cj\n",
      "rcyivrehlfecveas h oc eniw pktr yrun eedrmneveoxqktu cbeedcysap ziliiwaei teti p \n",
      "grfdie arijhssbceeqyojethev  haawrcvehst  mr alqe v iwnwuevp tie oettynifk se oei\n",
      "fy aytqvo kpgaf  ozt blijwsueirpn  odifomkiu ulyezr rw thessob ywetrtnvi tdezileh\n",
      "================================================================================\n",
      "Validation set perplexity: 20.76\n",
      "Average loss at step 100: 2.294127 learning rate: 10.000000\n",
      "Minibatch perplexity: 6.74\n",
      "Validation set perplexity: 9.11\n",
      "Average loss at step 200: 1.970135 learning rate: 10.000000\n",
      "Minibatch perplexity: 6.24\n",
      "Validation set perplexity: 8.31\n",
      "Average loss at step 300: 1.875337 learning rate: 10.000000\n",
      "Minibatch perplexity: 6.24\n",
      "Validation set perplexity: 8.21\n",
      "Average loss at step 400: 1.821280 learning rate: 10.000000\n",
      "Minibatch perplexity: 6.02\n",
      "Validation set perplexity: 8.56\n",
      "Average loss at step 500: 1.793596 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.78\n",
      "Validation set perplexity: 7.96\n",
      "Average loss at step 600: 1.750613 learning rate: 10.000000\n",
      "Minibatch perplexity: 6.04\n",
      "Validation set perplexity: 8.15\n",
      "Average loss at step 700: 1.744874 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.99\n",
      "Validation set perplexity: 7.91\n",
      "Average loss at step 800: 1.709380 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.83\n",
      "Validation set perplexity: 8.17\n",
      "Average loss at step 900: 1.707013 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.91\n",
      "Validation set perplexity: 7.75\n",
      "Average loss at step 1000: 1.693171 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.33\n",
      "================================================================================\n",
      "qter one nine threge direct bre a on to probut which with duide a sciences the va\n",
      "ue a so a cering packey which pland demolow in becutkul and angos it would more r\n",
      "gc three two zero zero memmas of locomust politer that doy by fawn  the one nine \n",
      "ni the posnic in varian leb vie by capter that two zero zero zero zis three phili\n",
      "yfl a propolies belothous chessapolid to not had by of polisted plands and empiri\n",
      "================================================================================\n",
      "Validation set perplexity: 8.10\n",
      "Average loss at step 1100: 1.686150 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.13\n",
      "Validation set perplexity: 8.03\n",
      "Average loss at step 1200: 1.681396 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.95\n",
      "Validation set perplexity: 7.77\n",
      "Average loss at step 1300: 1.665828 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.37\n",
      "Validation set perplexity: 8.41\n",
      "Average loss at step 1400: 1.668814 learning rate: 10.000000\n",
      "Minibatch perplexity: 6.27\n",
      "Validation set perplexity: 7.99\n",
      "Average loss at step 1500: 1.688824 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.59\n",
      "Validation set perplexity: 8.03\n",
      "Average loss at step 1600: 1.680478 learning rate: 10.000000\n",
      "Minibatch perplexity: 6.22\n",
      "Validation set perplexity: 7.46\n",
      "Average loss at step 1700: 1.653859 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.55\n",
      "Validation set perplexity: 7.71\n",
      "Average loss at step 1800: 1.679414 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.32\n",
      "Validation set perplexity: 7.69\n",
      "Average loss at step 1900: 1.685665 learning rate: 10.000000\n",
      "Minibatch perplexity: 6.65\n",
      "Validation set perplexity: 7.86\n",
      "Average loss at step 2000: 1.643932 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.12\n",
      "================================================================================\n",
      "pdon sate where five nine zero d and to second colorigantion had upse origh jown \n",
      "vronomics claire of capply sesed by though but lia a myand hoing ii outsignor org\n",
      "qrlives colre is in open chien areaski with lic seven three a foreas was frankds \n",
      "qc polining joint jouseven dive two zero six six five four three one two four inv\n",
      "kbts todep only biology in the potensional at the computer preacheal living enthe\n",
      "================================================================================\n",
      "Validation set perplexity: 7.46\n",
      "Average loss at step 2100: 1.645154 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.24\n",
      "Validation set perplexity: 7.28\n",
      "Average loss at step 2200: 1.626880 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.91\n",
      "Validation set perplexity: 7.38\n",
      "Average loss at step 2300: 1.662672 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.69\n",
      "Validation set perplexity: 7.77\n",
      "Average loss at step 2400: 1.654222 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.69\n",
      "Validation set perplexity: 7.70\n",
      "Average loss at step 2500: 1.633783 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.09\n",
      "Validation set perplexity: 7.23\n",
      "Average loss at step 2600: 1.619199 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.86\n",
      "Validation set perplexity: 7.34\n",
      "Average loss at step 2700: 1.620473 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.20\n",
      "Validation set perplexity: 7.52\n",
      "Average loss at step 2800: 1.625931 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.26\n",
      "Validation set perplexity: 7.36\n",
      "Average loss at step 2900: 1.602973 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.02\n",
      "Validation set perplexity: 7.29\n",
      "Average loss at step 3000: 1.604709 learning rate: 10.000000\n",
      "Minibatch perplexity: 6.07\n",
      "================================================================================\n",
      "afted the competitivity and the flop simple cromis performal expeacited self mont\n",
      "tch at this general church dpoind vances dml frandhio two three intructed of for \n",
      "cn for economen a scountinity enderton westral in the tramerfeled german iracient\n",
      "sxn b ordinal bandard ited french tenism dailea were aray how the pkonium large t\n",
      "vin and at the war ling letural soberted the operaterol of linxustanimatic ruary \n",
      "================================================================================\n",
      "Validation set perplexity: 7.39\n",
      "Average loss at step 3100: 1.628776 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.79\n",
      "Validation set perplexity: 7.38\n",
      "Average loss at step 3200: 1.627529 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.69\n",
      "Validation set perplexity: 7.33\n",
      "Average loss at step 3300: 1.615460 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.80\n",
      "Validation set perplexity: 7.16\n",
      "Average loss at step 3400: 1.608572 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.12\n",
      "Validation set perplexity: 7.51\n",
      "Average loss at step 3500: 1.603931 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.92\n",
      "Validation set perplexity: 6.89\n",
      "Average loss at step 3600: 1.575938 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.33\n",
      "Validation set perplexity: 7.27\n",
      "Average loss at step 3700: 1.597974 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.20\n",
      "Validation set perplexity: 7.08\n",
      "Average loss at step 3800: 1.606371 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.77\n",
      "Validation set perplexity: 7.38\n",
      "Average loss at step 3900: 1.619028 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.87\n",
      "Validation set perplexity: 7.41\n",
      "Average loss at step 4000: 1.600770 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.42\n",
      "================================================================================\n",
      "zs unis fiftmh government leb    see of k com artil origing of a gerizes around h\n",
      "jjt to support around argubus stantation actor in gale s and he being the step re\n",
      "ack the sunn etpdtman spirity juners is offessom four award internststandings cou\n",
      "n invaring and chinal an well lund john sources two and to for generally on enong\n",
      "ocial featruction but as officult at armed by four source of indiaridy for two st\n",
      "================================================================================\n",
      "Validation set perplexity: 6.98\n",
      "Average loss at step 4100: 1.616020 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.18\n",
      "Validation set perplexity: 7.20\n",
      "Average loss at step 4200: 1.593565 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.82\n",
      "Validation set perplexity: 7.36\n",
      "Average loss at step 4300: 1.591293 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.14\n",
      "Validation set perplexity: 7.26\n",
      "Average loss at step 4400: 1.600939 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.71\n",
      "Validation set perplexity: 7.01\n",
      "Average loss at step 4500: 1.604480 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.34\n",
      "Validation set perplexity: 7.33\n",
      "Average loss at step 4600: 1.591663 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.84\n",
      "Validation set perplexity: 7.25\n",
      "Average loss at step 4700: 1.594270 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.81\n",
      "Validation set perplexity: 7.62\n",
      "Average loss at step 4800: 1.610245 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.82\n",
      "Validation set perplexity: 7.61\n",
      "Average loss at step 4900: 1.589588 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.23\n",
      "Validation set perplexity: 7.52\n",
      "Average loss at step 5000: 1.608194 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.71\n",
      "================================================================================\n",
      "gbe photions hembed and toxecando unrefb propeach canda creation jeasion a contin\n",
      "yaxick in pair s ends first with homepage excebroirclar gold as two two zero star\n",
      "uv ruluctor rerved mintifesty kuway regions caran prson with as programs first of\n",
      "ve major where easix one nine seven determs was hand shipolas dunforpincted state\n",
      "cdned as a friascendary would ream was the vithern real as persona id of with fil\n",
      "================================================================================\n",
      "Validation set perplexity: 7.62\n",
      "Average loss at step 5100: 1.595932 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.59\n",
      "Validation set perplexity: 7.36\n",
      "Average loss at step 5200: 1.602364 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.07\n",
      "Validation set perplexity: 7.55\n",
      "Average loss at step 5300: 1.592167 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.94\n",
      "Validation set perplexity: 7.37\n",
      "Average loss at step 5400: 1.576422 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.49\n",
      "Validation set perplexity: 7.45\n",
      "Average loss at step 5500: 1.581764 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.08\n",
      "Validation set perplexity: 7.61\n",
      "Average loss at step 5600: 1.605472 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.59\n",
      "Validation set perplexity: 7.08\n",
      "Average loss at step 5700: 1.582285 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.85\n",
      "Validation set perplexity: 7.17\n",
      "Average loss at step 5800: 1.578439 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.24\n",
      "Validation set perplexity: 7.13\n",
      "Average loss at step 5900: 1.586146 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.83\n",
      "Validation set perplexity: 7.11\n",
      "Average loss at step 6000: 1.595398 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.27\n",
      "================================================================================\n",
      "bn be a universed givined assice other on every perfoen in cadition of self in a \n",
      "uard the elemben and both the back united state university was does these can its\n",
      "hmonerly as equar libersh keine of activities claim in sign poudgested suppots se\n",
      "mween kber of polics long been uses ansdays a using in iu trialovel apolist oncea\n",
      "thor its violated that model never the but so connection a s mumackkosee may forc\n",
      "================================================================================\n",
      "Validation set perplexity: 6.85\n",
      "Average loss at step 6100: 1.611725 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.00\n",
      "Validation set perplexity: 7.01\n",
      "Average loss at step 6200: 1.592457 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.87\n",
      "Validation set perplexity: 7.23\n",
      "Average loss at step 6300: 1.595724 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.26\n",
      "Validation set perplexity: 7.21\n",
      "Average loss at step 6400: 1.623608 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.21\n",
      "Validation set perplexity: 7.08\n",
      "Average loss at step 6500: 1.636454 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.82\n",
      "Validation set perplexity: 7.20\n",
      "Average loss at step 6600: 1.609778 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.91\n",
      "Validation set perplexity: 7.15\n",
      "Average loss at step 6700: 1.610509 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.94\n",
      "Validation set perplexity: 7.13\n",
      "Average loss at step 6800: 1.593407 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.83\n",
      "Validation set perplexity: 7.32\n",
      "Average loss at step 6900: 1.557133 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.70\n",
      "Validation set perplexity: 6.85\n",
      "Average loss at step 7000: 1.602078 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.79\n",
      "================================================================================\n",
      "wmely longommedon model one of the one nine two other m hasagnific but of altong \n",
      "bv belorgan ba life of act in one nine one nine five two jerrained half allow the\n",
      "jt the rocesse to elpiymbinancy the pany who pancomplement of memotors eight one \n",
      "jx fir buildes activeless stabasitions bors and nine nizing the nacialism the dis\n",
      "vy two arravouns in evilied by insteatura one six seven one nine nine seven mr an\n",
      "================================================================================\n",
      "Validation set perplexity: 6.81\n",
      "Average loss at step 7100: 1.598239 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.53\n",
      "Validation set perplexity: 6.78\n",
      "Average loss at step 7200: 1.586095 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.23\n",
      "Validation set perplexity: 6.87\n",
      "Average loss at step 7300: 1.603092 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.67\n",
      "Validation set perplexity: 7.07\n",
      "Average loss at step 7400: 1.591021 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.03\n",
      "Validation set perplexity: 6.70\n",
      "Average loss at step 7500: 1.587572 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.92\n",
      "Validation set perplexity: 6.86\n",
      "Average loss at step 7600: 1.577642 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.93\n",
      "Validation set perplexity: 6.64\n",
      "Average loss at step 7700: 1.590720 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.59\n",
      "Validation set perplexity: 6.81\n",
      "Average loss at step 7800: 1.599501 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.07\n",
      "Validation set perplexity: 6.91\n",
      "Average loss at step 7900: 1.614835 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.10\n",
      "Validation set perplexity: 6.73\n",
      "Average loss at step 8000: 1.602334 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.68\n",
      "================================================================================\n",
      "yck fort presence for tarch hasteermy direction is although plang and vythemilism\n",
      "pment art the jacut demoneetast the use to milt of liter term his germanicatians \n",
      "qoic survyz among between seven forth peters a ghly in religions one two sticforn\n",
      "ake attoch two one nimita coasity for religions was one three seven four will mam\n",
      "kj will julials to versed war links planes the ital langnication tics usember to \n",
      "================================================================================\n",
      "Validation set perplexity: 6.49\n",
      "Average loss at step 8100: 1.573146 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.70\n",
      "Validation set perplexity: 6.56\n",
      "Average loss at step 8200: 1.580510 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.10\n",
      "Validation set perplexity: 7.11\n",
      "Average loss at step 8300: 1.595134 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.92\n",
      "Validation set perplexity: 6.82\n",
      "Average loss at step 8400: 1.591989 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.76\n",
      "Validation set perplexity: 7.03\n",
      "Average loss at step 8500: 1.603661 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.56\n",
      "Validation set perplexity: 7.01\n",
      "Average loss at step 8600: 1.608672 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.54\n",
      "Validation set perplexity: 7.00\n",
      "Average loss at step 8700: 1.597277 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.68\n",
      "Validation set perplexity: 7.02\n",
      "Average loss at step 8800: 1.612346 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.45\n",
      "Validation set perplexity: 6.93\n",
      "Average loss at step 8900: 1.588059 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.68\n",
      "Validation set perplexity: 7.07\n",
      "Average loss at step 9000: 1.598808 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.48\n",
      "================================================================================\n",
      "hs one nine four five he provides the bodicists gunce but gangly relitably to he \n",
      "ijaniecy tri surder island hearnet century party on the unitemrces the cominated \n",
      "dquo also long line three nine eight zero five one nine km thece creatre the firs\n",
      "dtl recept highn   u geneting bughkmorut was called to including ths company spee\n",
      "jects and remaning oupns in adderstians on community and france by delived flatti\n",
      "================================================================================\n",
      "Validation set perplexity: 7.08\n",
      "Average loss at step 9100: 1.602952 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.56\n",
      "Validation set perplexity: 7.12\n",
      "Average loss at step 9200: 1.619843 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.47\n",
      "Validation set perplexity: 7.00\n",
      "Average loss at step 9300: 1.612954 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.38\n",
      "Validation set perplexity: 7.23\n",
      "Average loss at step 9400: 1.598956 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.95\n",
      "Validation set perplexity: 6.86\n",
      "Average loss at step 9500: 1.603547 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.40\n",
      "Validation set perplexity: 6.91\n",
      "Average loss at step 9600: 1.602626 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.41\n",
      "Validation set perplexity: 6.84\n",
      "Average loss at step 9700: 1.609943 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.13\n",
      "Validation set perplexity: 6.69\n",
      "Average loss at step 9800: 1.607028 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.38\n",
      "Validation set perplexity: 6.82\n",
      "Average loss at step 9900: 1.572156 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.79\n",
      "Validation set perplexity: 6.96\n",
      "Average loss at step 10000: 1.588845 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.15\n",
      "================================================================================\n",
      "ut was to long hose forms oddide protectoric shipment time to metro could to end \n",
      "ack to stamil common as industrian quetry user mostly mes regime the tman from fr\n",
      "pv some explocao russical cusamate crossed at defificu mores point levil dor marc\n",
      "pately seo the roman intcs homiton however homeroestect that the data s compoinsi\n",
      "cvao to the object but of caearl communic litve rouscarection the convemptures or\n",
      "================================================================================\n",
      "Validation set perplexity: 7.17\n",
      "Average loss at step 10100: 1.610666 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.39\n",
      "Validation set perplexity: 7.12\n",
      "Average loss at step 10200: 1.600238 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.62\n",
      "Validation set perplexity: 6.87\n",
      "Average loss at step 10300: 1.594538 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.16\n",
      "Validation set perplexity: 7.19\n",
      "Average loss at step 10400: 1.600576 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.44\n",
      "Validation set perplexity: 7.37\n",
      "Average loss at step 10500: 1.617158 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.99\n",
      "Validation set perplexity: 6.76\n",
      "Average loss at step 10600: 1.561949 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.89\n",
      "Validation set perplexity: 7.03\n",
      "Average loss at step 10700: 1.572677 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.72\n",
      "Validation set perplexity: 7.12\n",
      "Average loss at step 10800: 1.590255 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.73\n",
      "Validation set perplexity: 7.14\n",
      "Average loss at step 10900: 1.599919 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.73\n",
      "Validation set perplexity: 7.00\n",
      "Average loss at step 11000: 1.577325 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.15\n",
      "================================================================================\n",
      " chillez charlution of the contentioches vas long and maintry after his continegi\n",
      "rs of two five foreignetize to sing mong which insantment the demistical study ad\n",
      "ygotter lonent counte tedcted his diversityther learned their falls adming shortl\n",
      "rns a soviet in the cultriii dombant bombinatenture centraliam bersains busaith a\n",
      "acket but methy s developed also freedenerg itenmes pershed inteadts anceurices i\n",
      "================================================================================\n",
      "Validation set perplexity: 6.87\n",
      "Average loss at step 11100: 1.558747 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.19\n",
      "Validation set perplexity: 7.40\n",
      "Average loss at step 11200: 1.563837 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.59\n",
      "Validation set perplexity: 6.74\n",
      "Average loss at step 11300: 1.552837 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.18\n",
      "Validation set perplexity: 7.07\n",
      "Average loss at step 11400: 1.559448 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.04\n",
      "Validation set perplexity: 6.65\n",
      "Average loss at step 11500: 1.571357 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.65\n",
      "Validation set perplexity: 6.93\n",
      "Average loss at step 11600: 1.542222 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.80\n",
      "Validation set perplexity: 7.17\n",
      "Average loss at step 11700: 1.539709 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.80\n",
      "Validation set perplexity: 7.00\n",
      "Average loss at step 11800: 1.564172 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.38\n",
      "Validation set perplexity: 7.05\n",
      "Average loss at step 11900: 1.554298 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.13\n",
      "Validation set perplexity: 6.98\n",
      "Average loss at step 12000: 1.539188 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.20\n",
      "================================================================================\n",
      "qtical see electrical a predural education of that negative its he compouniy of i\n",
      "zan annalysol of the bell though as lauth the cale into the also only by our comm\n",
      "r s praised shable dosex in requires hort for two funcities late sep s brunnology\n",
      "ygorgith gosix six qalute upty granky is ray one nine seven two zero four vioes w\n",
      "ob as on a loss and a japan encile charence up mediation of ine many used two ter\n",
      "================================================================================\n",
      "Validation set perplexity: 6.91\n",
      "Average loss at step 12100: 1.537407 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.07\n",
      "Validation set perplexity: 6.47\n",
      "Average loss at step 12200: 1.563940 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.87\n",
      "Validation set perplexity: 6.53\n",
      "Average loss at step 12300: 1.551397 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.63\n",
      "Validation set perplexity: 6.62\n",
      "Average loss at step 12400: 1.590912 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.77\n",
      "Validation set perplexity: 6.73\n",
      "Average loss at step 12500: 1.562756 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.05\n",
      "Validation set perplexity: 6.83\n",
      "Average loss at step 12600: 1.550896 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.57\n",
      "Validation set perplexity: 6.74\n",
      "Average loss at step 12700: 1.552257 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.43\n",
      "Validation set perplexity: 6.79\n",
      "Average loss at step 12800: 1.564752 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.46\n",
      "Validation set perplexity: 6.99\n",
      "Average loss at step 12900: 1.593982 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.89\n",
      "Validation set perplexity: 7.05\n",
      "Average loss at step 13000: 1.564957 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.24\n",
      "================================================================================\n",
      "mrapment article disputer to methy has but the potent inboray the desiginal canic\n",
      "lsd refer who anotols spacinity and laudicians three zero zero peipped which sola\n",
      "yxoning which extennerade that hundred that perfect although zoe chargebrelation \n",
      "xbarath four of the year fare and have the tart as he kogstbe nine title sid hear\n",
      "xrse a lasks force and maycorics aspape pags not than studie to time speed persat\n",
      "================================================================================\n",
      "Validation set perplexity: 6.80\n",
      "Average loss at step 13100: 1.562430 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.61\n",
      "Validation set perplexity: 6.76\n",
      "Average loss at step 13200: 1.598720 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.16\n",
      "Validation set perplexity: 6.63\n",
      "Average loss at step 13300: 1.580562 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.75\n",
      "Validation set perplexity: 6.75\n",
      "Average loss at step 13400: 1.585221 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.79\n",
      "Validation set perplexity: 6.54\n",
      "Average loss at step 13500: 1.597545 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.83\n",
      "Validation set perplexity: 6.68\n",
      "Average loss at step 13600: 1.580464 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.40\n",
      "Validation set perplexity: 6.86\n",
      "Average loss at step 13700: 1.555075 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.47\n",
      "Validation set perplexity: 6.55\n",
      "Average loss at step 13800: 1.539919 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.71\n",
      "Validation set perplexity: 6.72\n",
      "Average loss at step 13900: 1.566485 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.78\n",
      "Validation set perplexity: 6.94\n",
      "Average loss at step 14000: 1.560851 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.72\n",
      "================================================================================\n",
      "jh s the plutestprise parts hams wentationalism the shiony a formac be reforses o\n",
      "tmiscrinitsaddisaugans are of an  enguted sacultiting times acrolet of liberal  o\n",
      "ggen a commodore his weake in a grapatical francessors corry athen the university\n",
      "tly not space a liberally a cosion o and war a charress audioic mises throughas d\n",
      "cjeerigos lories set decident in sign also the iredwars anuary design amphistic a\n",
      "================================================================================\n",
      "Validation set perplexity: 6.68\n",
      "Average loss at step 14100: 1.576862 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.28\n",
      "Validation set perplexity: 6.54\n",
      "Average loss at step 14200: 1.579525 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.94\n",
      "Validation set perplexity: 6.75\n",
      "Average loss at step 14300: 1.569383 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.72\n",
      "Validation set perplexity: 6.55\n",
      "Average loss at step 14400: 1.581851 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.78\n",
      "Validation set perplexity: 7.01\n",
      "Average loss at step 14500: 1.613010 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.20\n",
      "Validation set perplexity: 6.99\n",
      "Average loss at step 14600: 1.588160 learning rate: 10.000000\n",
      "Minibatch perplexity: 6.06\n",
      "Validation set perplexity: 6.72\n",
      "Average loss at step 14700: 1.605740 learning rate: 10.000000\n",
      "Minibatch perplexity: 5.19\n",
      "Validation set perplexity: 6.79\n",
      "Average loss at step 14800: 1.585137 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.95\n",
      "Validation set perplexity: 6.61\n",
      "Average loss at step 14900: 1.581300 learning rate: 10.000000\n",
      "Minibatch perplexity: 4.28\n",
      "Validation set perplexity: 6.81\n",
      "Average loss at step 15000: 1.578250 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.71\n",
      "================================================================================\n",
      "mc the pass country manindessox to kminating politite dyrritorimee bn zero zero z\n",
      "ic ectraccer a prime the varions two nine two zero zero theset did reprewal one o\n",
      "jzmnifh one groupsifysibs over paint bra vehap one nine zero pedrooked off fier f\n",
      "yer hoston largest pregience conserven to uses  is approos played veried frivas a\n",
      "v divine for acations uniquest vative in the scopenhanseasonally charact the law \n",
      "================================================================================\n",
      "Validation set perplexity: 6.65\n",
      "Average loss at step 15100: 1.539423 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.67\n",
      "Validation set perplexity: 6.59\n",
      "Average loss at step 15200: 1.563447 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.73\n",
      "Validation set perplexity: 6.63\n",
      "Average loss at step 15300: 1.530291 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.58\n",
      "Validation set perplexity: 6.59\n",
      "Average loss at step 15400: 1.536919 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.03\n",
      "Validation set perplexity: 6.54\n",
      "Average loss at step 15500: 1.502490 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.66\n",
      "Validation set perplexity: 6.54\n",
      "Average loss at step 15600: 1.518093 learning rate: 1.000000\n",
      "Minibatch perplexity: 5.78\n",
      "Validation set perplexity: 6.53\n",
      "Average loss at step 15700: 1.511408 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.68\n",
      "Validation set perplexity: 6.43\n",
      "Average loss at step 15800: 1.499718 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.40\n",
      "Validation set perplexity: 6.42\n",
      "Average loss at step 15900: 1.518945 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.57\n",
      "Validation set perplexity: 6.42\n",
      "Average loss at step 16000: 1.525942 learning rate: 1.000000\n",
      "Minibatch perplexity: 5.08\n",
      "================================================================================\n",
      "yu a were and presenss and may with enigmes by miniss a de would be after engine \n",
      "qk bat some read to washingtor one first infcus acquadoil mexame in the smallaund\n",
      "xwel hlw for triastian in annobason one nine eight zero nine eight eight fluence \n",
      "hx to legas or saw alread figin no assume courference are ordinable then genres o\n",
      "jwinment rance will character early which other of ingdom in one the ever extress\n",
      "================================================================================\n",
      "Validation set perplexity: 6.40\n",
      "Average loss at step 16100: 1.520273 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.90\n",
      "Validation set perplexity: 6.35\n",
      "Average loss at step 16200: 1.486306 learning rate: 1.000000\n",
      "Minibatch perplexity: 3.89\n",
      "Validation set perplexity: 6.36\n",
      "Average loss at step 16300: 1.473654 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.54\n",
      "Validation set perplexity: 6.29\n",
      "Average loss at step 16400: 1.511287 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.74\n",
      "Validation set perplexity: 6.33\n",
      "Average loss at step 16500: 1.521893 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.40\n",
      "Validation set perplexity: 6.33\n",
      "Average loss at step 16600: 1.515765 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.41\n",
      "Validation set perplexity: 6.28\n",
      "Average loss at step 16700: 1.555418 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.94\n",
      "Validation set perplexity: 6.28\n",
      "Average loss at step 16800: 1.507596 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.63\n",
      "Validation set perplexity: 6.24\n",
      "Average loss at step 16900: 1.520794 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.97\n",
      "Validation set perplexity: 6.23\n",
      "Average loss at step 17000: 1.527741 learning rate: 1.000000\n",
      "Minibatch perplexity: 5.19\n",
      "================================================================================\n",
      "cf slogy typical bioux one one easy other the first shwell game them clutch could\n",
      "nes of the house the it of hierg community avalwable for acft community nowi one \n",
      "eis in two zero zero of one nine five two zero zero three shaption of texts have \n",
      "afed the eight and more buallian the pointminl infransit start as the m regulatin\n",
      "okey a many d only finalitured by thancorrions for a him datal his french persona\n",
      "================================================================================\n",
      "Validation set perplexity: 6.21\n",
      "Average loss at step 17100: 1.513090 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.49\n",
      "Validation set perplexity: 6.29\n",
      "Average loss at step 17200: 1.539017 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.97\n",
      "Validation set perplexity: 6.23\n",
      "Average loss at step 17300: 1.546002 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.92\n",
      "Validation set perplexity: 6.29\n",
      "Average loss at step 17400: 1.585150 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.56\n",
      "Validation set perplexity: 6.30\n",
      "Average loss at step 17500: 1.568547 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.93\n",
      "Validation set perplexity: 6.27\n",
      "Average loss at step 17600: 1.585715 learning rate: 1.000000\n",
      "Minibatch perplexity: 5.52\n",
      "Validation set perplexity: 6.31\n",
      "Average loss at step 17700: 1.577505 learning rate: 1.000000\n",
      "Minibatch perplexity: 5.04\n",
      "Validation set perplexity: 6.32\n",
      "Average loss at step 17800: 1.551781 learning rate: 1.000000\n",
      "Minibatch perplexity: 3.92\n",
      "Validation set perplexity: 6.33\n",
      "Average loss at step 17900: 1.556548 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.63\n",
      "Validation set perplexity: 6.31\n",
      "Average loss at step 18000: 1.526999 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.80\n",
      "================================================================================\n",
      "yvdaaws coastiston funick a campubeen causes to call in puts beaker to the sain i\n",
      "tly graphone four a was buctions painted irager understan called one seven two pa\n",
      "lty b englished a was aze fight foot as a becemoral armed a structure ars were in\n",
      "kxile control moneymplically joined corroring the three zero and righter are zero\n",
      "kqr ody clinenssible force the armic itish that of upp and computing without offe\n",
      "================================================================================\n",
      "Validation set perplexity: 6.27\n",
      "Average loss at step 18100: 1.512362 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.74\n",
      "Validation set perplexity: 6.24\n",
      "Average loss at step 18200: 1.534779 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.86\n",
      "Validation set perplexity: 6.26\n",
      "Average loss at step 18300: 1.544049 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.88\n",
      "Validation set perplexity: 6.29\n",
      "Average loss at step 18400: 1.569889 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.55\n",
      "Validation set perplexity: 6.29\n",
      "Average loss at step 18500: 1.564839 learning rate: 1.000000\n",
      "Minibatch perplexity: 5.46\n",
      "Validation set perplexity: 6.28\n",
      "Average loss at step 18600: 1.568991 learning rate: 1.000000\n",
      "Minibatch perplexity: 5.06\n",
      "Validation set perplexity: 6.31\n",
      "Average loss at step 18700: 1.565813 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.44\n",
      "Validation set perplexity: 6.34\n",
      "Average loss at step 18800: 1.566493 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.86\n",
      "Validation set perplexity: 6.26\n",
      "Average loss at step 18900: 1.548388 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.74\n",
      "Validation set perplexity: 6.27\n",
      "Average loss at step 19000: 1.594834 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.75\n",
      "================================================================================\n",
      "zlor oxford organism lecessorcas may not is a vote linments advicky united christ\n",
      "oks which contence kum prime ineveracident thecut rrica pal nash great turned by \n",
      "bme georal paki celties on that means todars ofthestimes synthestrother the en de\n",
      "fbative natv communical shohonwish any contain ira othermic sharboriagistry indon\n",
      "km the durts ionic system zealow auguarail which see krhy purent funciative given\n",
      "================================================================================\n",
      "Validation set perplexity: 6.23\n",
      "Average loss at step 19100: 1.577726 learning rate: 1.000000\n",
      "Minibatch perplexity: 5.08\n",
      "Validation set perplexity: 6.21\n",
      "Average loss at step 19200: 1.550301 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.28\n",
      "Validation set perplexity: 6.24\n",
      "Average loss at step 19300: 1.554761 learning rate: 1.000000\n",
      "Minibatch perplexity: 5.32\n",
      "Validation set perplexity: 6.27\n",
      "Average loss at step 19400: 1.531718 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.57\n",
      "Validation set perplexity: 6.30\n",
      "Average loss at step 19500: 1.533805 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.57\n",
      "Validation set perplexity: 6.35\n",
      "Average loss at step 19600: 1.544765 learning rate: 1.000000\n",
      "Minibatch perplexity: 5.22\n",
      "Validation set perplexity: 6.31\n",
      "Average loss at step 19700: 1.551418 learning rate: 1.000000\n",
      "Minibatch perplexity: 5.26\n",
      "Validation set perplexity: 6.39\n",
      "Average loss at step 19800: 1.537303 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.94\n",
      "Validation set perplexity: 6.39\n",
      "Average loss at step 19900: 1.544498 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.97\n",
      "Validation set perplexity: 6.31\n",
      "Average loss at step 20000: 1.515656 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.48\n",
      "================================================================================\n",
      "ctive puroson ffered by one nine eight zero five four three nine sequence one yea\n",
      "pj cricharge world sovereetic non liberage weeka you to the links the rightified \n",
      "c students on not in one nine six eight zero s killunal white his will tilochled \n",
      "xb had splitarian the camlf national world fee without for early in world individ\n",
      "fktly presenced of passocial yanld life the such ideas and territius to built the\n",
      "================================================================================\n",
      "Validation set perplexity: 6.32\n",
      "Average loss at step 20100: 1.522983 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.44\n",
      "Validation set perplexity: 6.36\n",
      "Average loss at step 20200: 1.523624 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.91\n",
      "Validation set perplexity: 6.31\n",
      "Average loss at step 20300: 1.543516 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.77\n",
      "Validation set perplexity: 6.32\n",
      "Average loss at step 20400: 1.547590 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.32\n",
      "Validation set perplexity: 6.25\n",
      "Average loss at step 20500: 1.545989 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.32\n",
      "Validation set perplexity: 6.23\n",
      "Average loss at step 20600: 1.514326 learning rate: 1.000000\n",
      "Minibatch perplexity: 5.04\n",
      "Validation set perplexity: 6.32\n",
      "Average loss at step 20700: 1.502698 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.47\n",
      "Validation set perplexity: 6.25\n",
      "Average loss at step 20800: 1.524500 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.41\n",
      "Validation set perplexity: 6.16\n",
      "Average loss at step 20900: 1.516528 learning rate: 1.000000\n",
      "Minibatch perplexity: 4.02\n",
      "Validation set perplexity: 6.27\n",
      "Average loss at step 21000: 1.519367 learning rate: 1.000000\n",
      "Minibatch perplexity: 3.80\n",
      "================================================================================\n",
      "mr fires shwellause of the charled by hargelon a specialists in the end by depend\n",
      "wfhole in anciential white open argue of value or pressor which confineses of be \n",
      "b later a houstonness just criticizen on these on education the amest in the abne\n",
      "fhe give a mounts loves portures to guitaft diical atts the agreek brayllen ths o\n",
      "ds for produces language the exception was describes chooe righ areay isbn f indi\n",
      "================================================================================\n",
      "Validation set perplexity: 6.24\n"
     ]
    }
   ],
   "source": [
    "import collections\n",
    "num_steps = 21001\n",
    "summary_frequency = 100\n",
    "\n",
    "valid_batches = BatchGenerator(valid_text, 1, 2)\n",
    "\n",
    "with tf.Session(graph=graph) as session:\n",
    "  tf.initialize_all_variables().run()\n",
    "  print('Initialized')\n",
    "  mean_loss = 0\n",
    "  for step in range(num_steps):\n",
    "    batches = train_batches.next()\n",
    "    feed_dict = dict()\n",
    "    for i in range(num_unrollings + 1):\n",
    "      feed_dict[train_data[i]] = batches[i]\n",
    "    _, l, predictions, lr = session.run(\n",
    "      [optimizer, loss, train_prediction, learning_rate], feed_dict=feed_dict)\n",
    "    mean_loss += l\n",
    "    if step % summary_frequency == 0:\n",
    "      if step > 0:\n",
    "        mean_loss = mean_loss / summary_frequency\n",
    "      # The mean loss is an estimate of the loss over the last few batches.\n",
    "      print(\n",
    "        'Average loss at step %d: %f learning rate: %f' % (step, mean_loss, lr))\n",
    "      mean_loss = 0\n",
    "      labels = np.concatenate(list(batches)[2:])\n",
    "      print('Minibatch perplexity: %.2f' % float(\n",
    "        np.exp(logprob(predictions, labels))))\n",
    "      if step % (summary_frequency * 10) == 0:\n",
    "        # Generate some samples.\n",
    "        print('=' * 80)\n",
    "        for _ in range(5):\n",
    "          #feed = sample(random_distribution())\n",
    "          feed = collections.deque(maxlen=2)\n",
    "          for _ in range(2):  \n",
    "            feed.append(random_distribution())\n",
    "          #sentence = characters(feed)[0]\n",
    "          sentence = characters(feed[0])[0] + characters(feed[1])[0]\n",
    "          #print(sentence)\n",
    "          #print(feed)\n",
    "          reset_sample_state.run()\n",
    "          for _ in range(79):\n",
    "            prediction = sample_prediction.eval({\n",
    "                    sample_input[0]: feed[0],\n",
    "                    sample_input[1]: feed[1],\n",
    "                })\n",
    "            #feed = sample(prediction)\n",
    "            feed.append(sample(prediction))\n",
    "            #sentence += characters(feed)[0]\n",
    "            sentence += characters(feed[1])[0]\n",
    "          print(sentence)\n",
    "        print('=' * 80)\n",
    "      # Measure validation set perplexity.\n",
    "      reset_sample_state.run()\n",
    "      valid_logprob = 0\n",
    "      for _ in range(valid_size):\n",
    "        b = valid_batches.next()\n",
    "        predictions = sample_prediction.eval({\n",
    "                sample_input[0]: b[0],\n",
    "                sample_input[1]: b[1],\n",
    "                keep_prob_sample: 1.0\n",
    "            })\n",
    "        valid_logprob = valid_logprob + logprob(predictions, b[2])\n",
    "      print('Validation set perplexity: %.2f' % float(np.exp(\n",
    "        valid_logprob / valid_size)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Even with more steps, the final perplexity is not better. Since I do not know what to expect, and since I do not see any obvious issue (the perplexity being consistent), I'm stuck."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "Y5tapX3kpcqZ"
   },
   "source": [
    "---\n",
    "Problem 3\n",
    "---------\n",
    "\n",
    "(difficult!)\n",
    "\n",
    "Write a sequence-to-sequence LSTM which mirrors all the words in a sentence. For example, if your input is:\n",
    "\n",
    "    the quick brown fox\n",
    "    \n",
    "the model should attempt to output:\n",
    "\n",
    "    eht kciuq nworb xof\n",
    "    \n",
    "Refer to the lecture on how to put together a sequence-to-sequence model, as well as [this article](http://arxiv.org/abs/1409.3215) for best practices.\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "Unfortunately I did not have time to work on this problem in the timeframe of the course."
   ]
  }
 ],
 "metadata": {
  "colab": {
   "default_view": {},
   "name": "6_lstm.ipynb",
   "provenance": [],
   "version": "0.3.2",
   "views": {}
  },
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
